I’m an avid XyPerl user, and recently I encountered a rather nasty problem. It involves passing XML attribute values to XyPerl. Normally this is no big deal, but problems lurk on the horizon when an attribute value contains a $, an @, or a single or double quote. You should always be prepared for this possibility, because in many cases, you won’t know beforehand what’s inside an attribute value in the XML files you receive from a customer, and you don’t have any control over those files.

To demonstrate the problem, I created a small test division in XML mode, containing this simple paragraph:

<Para attr="someword">Just testing...</Para>

The Para start tag invokes the following:

</Pb;;xp_test.pl;Test::attval("${attr}")>

My XyPerl script (xp_test.pl) contains a simple subroutine, which just takes the argument passed to it and outputs it, followed by a space:

#!/usr/bin/perl -w

use strict;
package Test;

sub attval {
    my $value = shift;
    print "$value ";
}

1;

Simple enough. When I compose, this XyPerl routine generates the output “someword” followed by a space, as expected. So far, so good. And you would be inclined to think that this simple routine would work for whatever attribute value you could think of, right?

Unfortunately, this turns out not to be the case. The reality is more complex. Let’s change the test division to the following:

<Para attr="$9.99">Just testing...</Para>

This is where things start to go wrong. This time, the output is not “$9.99” as you might expect, but just “.99”! What happened? Well, XyPerl interpreted the dollar sign as the beginning of the name of a Perl variable called $9, which is of course undefined at this time. As a result, only the remaining characters are output: “.99”.

The first thing that came to my mind was to parse the argument value at the start of my subroutine and try to escape special characters like $, but this was the wrong idea. Because at the start of my subroutine, there was nothing more I could do. Harm had already been done.

When processing the /Pb macro, Compose doesn’t pass the literal “${attr}” to XyPerl, but the attribute value from the XML source. In other words, before anything gets to XyPerl, Compose first replaces “${attr}” with the actual attribute value, which is “$9.99”, so what really gets executed by Compose is this:

</Pb;;xp_test.pl;Test::attval("$9.99")>

This command invokes XyPerl, and XyPerl executes the following code. It’s a call to my subroutine:

Test::attval("$9.99")

Because $9 is undefined at the time my attval subroutine is called, the only thing that is passed to my subroutine is “.99”. So whatever I try to set things right in my subroutine, won’t work. It’s too late. I’ll never get my nine dollars back. They’re gone!

If the attribute value contains an @ character, things get even worse. Consider the following example, in which the attribute value contains my e-mail address:

<Para attr="geert@flin.be">Just testing...</Para>

In this case, XyPerl throws an error and doesn’t output anything at all. Check the Compose log to see the error message (the number after “eval” may vary):

eval_error(p): Global symbol "@flin" requires explicit package name at (eval 3) line 1.

Something similar happened here: XyPerl now tries to use an array called “@flin”, which of course it won’t find.

Now that we’ve seen some examples of things that can go wrong, let’s try to find a solution.

XyPerl obviously misinterprets the $ as the beginning of the name of a scalar variable, and the @ as the beginning of the name of an array variable. The reason why this happens, is because I’ve used double quotes around “${attr}” in the /Pb macro.

As every Perl programmer should know, double-quoted strings are subject to variable interpolation of scalar and list values. This means that $ and @ characters will be interpreted as the beginning of a variable name. In the above examples, Perl will try to find the scalar $9 and the array @flin.

I can prevent this from happening by replacing the double quotes with single quotes. This will prevent the Perl interpreter from performing variable interpolation. In other words, Perl won’t look for the variables $9 and @flin anymore. It will just treat the $ and @ as any other character.

So let’s rewrite the /Pb macro, now using single in stead of double quotes:

</Pb;;xp_test.pl;Test::attval('${attr}')>

A word of caution is needed here. The single quotes that you should use are the ASCII quotes (ASCII value 39). The ASCII single quote or apostrophe is not the single quote that you find on XPP’s Standard keyboard. If you use the wrong quote, you’ll see an error message in the Compose log.

To enter the ASCII single quote, you can enter the character’s XCS value as four hexadecimal digits, using Shift-F2. The XCS value of the single quote is the same as its ASCII value: 39 (decimal) or 27 (hexadecimal). So to get the quote in the /Pb macro, type Shift-F2 (XPP will respond with the prompt “Enter four hex codes”), followed by 0027.

A bit cumbersome, I know. But we are rewarded when we try to compose our little test division. This time it works! XyPerl now happily outputs the correct price ($9.99) and my e-mail address (geert@flin.be).

However, we’re not finished yet. Unfortunately, we’re now faced with another problem. The use of single quotes in the /Pb macro now prevents me from using single quotes in attribute values and will result in failure in cases like the one below:

<Para attr="Fermat's last theorem">Just testing...</Para>

When you hit the Compose button, nothing happens. A quick look at the Compose log reveals why:

eval_error(p): Substitution pattern not terminated at (eval 5) line 1.
!
error processing xycode (1) name = </Pb;;xp_test.pl;Test::attval('Fermat's last theorem')>

Indeed: before Compose invokes XyPerl, it again replaces ${attr} with the actual attribute value, which now contains a single quote. In combination with the single quotes which we’ve put around the argument, this gets us in serious trouble.

So what have we learned?

  • That a seemingly trivial thing as passing an attribute value as an argument to XyPerl will fail in some cases.
  • That using double quotes in the /Pb macro will result in unwanted variable interpolation when the attribute value contains one or more $ or @ characters.
  • That using single quotes in the /Pb macro will result in a substitution pattern not terminated error when the attribute value contains single quotes. Accordingly, using double quotes in the /Pb macro will result in a similar error when the attribute value contains double quotes.

So it seems that whether we use single or double quotes, we always risk running into problems. Help! What should we do?

After giving it some thought, I came up with the following solution: I’ll simply use an import transformation table (or an import transformation script) to convert the characters which are causing problems to numeric character references or character entity references. (Remember: a numeric character reference refers to a character by means of its numeric Unicode codepoint, e.g. &#39; or &#x27; – a character entity reference refers to a character by means of a predefined name, e.g. &apos;).

If you prefer to use double quotes in your /Pb macro’s, this means converting the characters $ and @ to $ and @, respectively. Or in hexadecimal notation: $ and @.

When dealing with the quote problem, we have to consider this: in XML, there are two ways to use attributes. One way is to use double quotes around the attribute values, another way is to use single quotes. Both ways are valid, and can even be mixed in the same XML file. When the XML source uses double quotes around an attribute value, this means that the attribute value can contain single quotes. When the XML source uses single quotes around an attribute value, this means that the attribute can contain double quotes!

The use of double or single quotes in the XML source has nothing to do with the use of double or single quotes in the /Pb macro. XPP will recognize attributes no matter what kind of quotes – single or double – are used in the XML source. But you have to be careful: if you use double quotes in your /Pb macro, you should make certain that the attribute value doesn’t contain double quotes, and if you use single quotes in your /Pb macro, you should make certain that the attribute value doesn’t contain single quotes. Otherwise you might be facing a Substitution pattern not terminated error.

So you’ll have to use an import transformation table or script to convert both single and double quotes to their corresponding character or entity references when these characters appear inside attribute values. But beware: don’t convert all quotes, just the ones inside attribute values. You certainly don’t want to convert the quotes that are used to delimit the attribute values!

Convert double quotes inside attribute values to &#34; (or &#x22; or &quot;). Convert single quotes inside attribute values to &#39; (or &#x27; or &apos;).

When a quote inside an attribute value is represented by a character or entity reference, it no longer poses a threat. Using our last example, Compose then executes the following code:

</Pb;;xp_test.pl;Test::attval('Fermat&apos;s last theorem')>

And XyPerl happily complies.

Conclusion:

  • Using an import transformation table or script, replace the characters $ and @ with a character or entity reference.
  • Do the same for single and double quotes which appear inside attribute values (but not the ones surrounding the attribute values).

If you do this, passing any attribute value as an argument to a XyPerl subroutine won’t give you any headaches anymore.