I’m an avid XyPerl user, and recently I encountered a rather nasty problem. It involves passing XML attribute values to XyPerl. Normally this is no big deal, but problems lurk on the horizon when an attribute value contains a $, an @, or a single or double quote. You should always be prepared for this possibility, because in many cases, you won’t know beforehand what’s inside an attribute value in the XML files you receive from a customer, and you don’t have any control over those files.
To demonstrate the problem, I created a small test division in XML mode, containing this simple paragraph:
<Para attr="someword">Just testing...</Para>
The Para start tag invokes the following:
</Pb;;xp_test.pl;Test::attval("${attr}")>
My XyPerl script (xp_test.pl) contains a simple subroutine, which just takes the argument passed to it and outputs it, followed by a space:
#!/usr/bin/perl -w
use strict;
package Test;
sub attval {
my $value = shift;
print "$value ";
}
1;
Simple enough. When I compose, this XyPerl routine generates the output “someword” followed by a space, as expected. So far, so good. And you would be inclined to think that this simple routine would work for whatever attribute value you could think of, right?
Unfortunately, this turns out not to be the case. The reality is more complex. Let’s change the test division to the following:
<Para attr="$9.99">Just testing...</Para>
This is where things start to go wrong. This time, the output is not “$9.99” as you might expect, but just “.99”! What happened? Well, XyPerl interpreted the dollar sign as the beginning of the name of a Perl variable called $9, which is of course undefined at this time. As a result, only the remaining characters are output: “.99”.
The first thing that came to my mind was to parse the argument value at the start of my subroutine and try to escape special characters like $, but this was the wrong idea. Because at the start of my subroutine, there was nothing more I could do. Harm had already been done.
When processing the /Pb macro, Compose doesn’t pass the literal “${attr}” to XyPerl, but the attribute value from the XML source. In other words, before anything gets to XyPerl, Compose first replaces “${attr}” with the actual attribute value, which is “$9.99”, so what really gets executed by Compose is this:
</Pb;;xp_test.pl;Test::attval("$9.99")>
This command invokes XyPerl, and XyPerl executes the following code. It’s a call to my subroutine:
Test::attval("$9.99")
Because $9 is undefined at the time my attval subroutine is called, the only thing that is passed to my subroutine is “.99”. So whatever I try to set things right in my subroutine, won’t work. It’s too late. I’ll never get my nine dollars back. They’re gone!
If the attribute value contains an @ character, things get even worse. Consider the following example, in which the attribute value contains my e-mail address:
<Para attr="geert@flin.be">Just testing...</Para>
In this case, XyPerl throws an error and doesn’t output anything at all. Check the Compose log to see the error message (the number after “eval” may vary):
eval_error(p): Global symbol "@flin" requires explicit package name at (eval 3) line 1.
Something similar happened here: XyPerl now tries to use an array called “@flin”, which of course it won’t find.
Now that we’ve seen some examples of things that can go wrong, let’s try to find a solution.
XyPerl obviously misinterprets the $ as the beginning of the name of a scalar variable, and the @ as the beginning of the name of an array variable. The reason why this happens, is because I’ve used double quotes around “${attr}” in the /Pb macro.
As every Perl programmer should know, double-quoted strings are subject to variable interpolation of scalar and list values. This means that $ and @ characters will be interpreted as the beginning of a variable name. In the above examples, Perl will try to find the scalar $9 and the array @flin.
I can prevent this from happening by replacing the double quotes with single quotes. This will prevent the Perl interpreter from performing variable interpolation. In other words, Perl won’t look for the variables $9 and @flin anymore. It will just treat the $ and @ as any other character.
So let’s rewrite the /Pb macro, now using single in stead of double quotes:
</Pb;;xp_test.pl;Test::attval('${attr}')>
A word of caution is needed here. The single quotes that you should use are the ASCII quotes (ASCII value 39). The ASCII single quote or apostrophe is not the single quote that you find on XPP’s Standard keyboard. If you use the wrong quote, you’ll see an error message in the Compose log.
To enter the ASCII single quote, you can enter the character’s XCS value as four hexadecimal digits, using Shift-F2. The XCS value of the single quote is the same as its ASCII value: 39 (decimal) or 27 (hexadecimal). So to get the quote in the /Pb macro, type Shift-F2 (XPP will respond with the prompt “Enter four hex codes”), followed by 0027.
A bit cumbersome, I know. But we are rewarded when we try to compose our little test division. This time it works! XyPerl now happily outputs the correct price ($9.99) and my e-mail address (geert@flin.be).
However, we’re not finished yet. Unfortunately, we’re now faced with another problem. The use of single quotes in the /Pb macro now prevents me from using single quotes in attribute values and will result in failure in cases like the one below:
<Para attr="Fermat's last theorem">Just testing...</Para>
When you hit the Compose button, nothing happens. A quick look at the Compose log reveals why:
eval_error(p): Substitution pattern not terminated at (eval 5) line 1.
!
error processing xycode (1) name = </Pb;;xp_test.pl;Test::attval('Fermat's last theorem')>
Indeed: before Compose invokes XyPerl, it again replaces ${attr} with the actual attribute value, which now contains a single quote. In combination with the single quotes which we’ve put around the argument, this gets us in serious trouble.
So what have we learned?
- That a seemingly trivial thing as passing an attribute value as an argument to XyPerl will fail in some cases.
- That using double quotes in the /Pb macro will result in unwanted variable interpolation when the attribute value contains one or more $ or @ characters.
- That using single quotes in the /Pb macro will result in a substitution pattern not terminated error when the attribute value contains single quotes. Accordingly, using double quotes in the /Pb macro will result in a similar error when the attribute value contains double quotes.
So it seems that whether we use single or double quotes, we always risk running into problems. Help! What should we do?
After giving it some thought, I came up with the following solution: I’ll simply use an import transformation table (or an import transformation script) to convert the characters which are causing problems to numeric character references or character entity references. (Remember: a numeric character reference refers to a character by means of its numeric Unicode codepoint, e.g. ' or ' – a character entity reference refers to a character by means of a predefined name, e.g. ').
If you prefer to use double quotes in your /Pb macro’s, this means converting the characters $ and @ to $ and @, respectively. Or in hexadecimal notation: $ and @.
When dealing with the quote problem, we have to consider this: in XML, there are two ways to use attributes. One way is to use double quotes around the attribute values, another way is to use single quotes. Both ways are valid, and can even be mixed in the same XML file. When the XML source uses double quotes around an attribute value, this means that the attribute value can contain single quotes. When the XML source uses single quotes around an attribute value, this means that the attribute can contain double quotes!
The use of double or single quotes in the XML source has nothing to do with the use of double or single quotes in the /Pb macro. XPP will recognize attributes no matter what kind of quotes – single or double – are used in the XML source. But you have to be careful: if you use double quotes in your /Pb macro, you should make certain that the attribute value doesn’t contain double quotes, and if you use single quotes in your /Pb macro, you should make certain that the attribute value doesn’t contain single quotes. Otherwise you might be facing a Substitution pattern not terminated error.
So you’ll have to use an import transformation table or script to convert both single and double quotes to their corresponding character or entity references when these characters appear inside attribute values. But beware: don’t convert all quotes, just the ones inside attribute values. You certainly don’t want to convert the quotes that are used to delimit the attribute values!
Convert double quotes inside attribute values to " (or " or "). Convert single quotes inside attribute values to ' (or ' or ').
When a quote inside an attribute value is represented by a character or entity reference, it no longer poses a threat. Using our last example, Compose then executes the following code:
</Pb;;xp_test.pl;Test::attval('Fermat's last theorem')>
And XyPerl happily complies.
Conclusion:
- Using an import transformation table or script, replace the characters $ and @ with a character or entity reference.
- Do the same for single and double quotes which appear inside attribute values (but not the ones surrounding the attribute values).
If you do this, passing any attribute value as an argument to a XyPerl subroutine won’t give you any headaches anymore.

4 users commented on " Passing XML attribute values to XyPerl "
Follow-up comment rss or Leave a TrackbackGeert,
I remembered we had a discussion on this item on the listserv.
(May 14 2008 in case somebody wants to search the archives)
The solution we came up with was to use the following form of quoting:
</Pb;;xp_test.pl;Test::attval(q/Fermat's last theorem/)>The q/…/ form is the more generic form of the single quote form of quoting. There is a generic form because quotes inside perl are not literal characters but are more or less operators.
If I remember well using this form of quoting, we could easily get around any possible problem character that might sit in the content that one wants to pass on.
Good to know also is that one can choose any other character in place of the /. So q#….# or q!…! work just as well.
Which allows you to adapt the delimiter to the content you are expecting.
Maybe you could give this a try and report back on your findings…
Bart,
I guess no matter what quote character you choose, the problem remains. If you decide to quote your strings with q/…/, this would fail if your string (i.e. the attribute value) contains a slash character. You would then have to convert all slashes (inside attribute values) to character references during import.
If you know for sure that attribute values don’t contain slashes, that’s fine. But if you don’t and your customer sends you an XML file with a slash in an attribute value, you would run into the same problems as I did.
If you don’t know what to expect and attribute values can in theory contain any possible character, there is no safe string delimiter. The solution is to first make sure that the character you intend to use as a quote delimiter doesn’t appear inside attribute values.
So if I want to use quotes as delimiters, the import transformation should first get rid of all quotes by converting them to character references or character entity references. If you want to use slashes, the import transformation should first get rid of all slashes inside attribute values, and convert them to character references.
I don’t think using the number sign (#) as a delimiter is a good idea, because that symbol is used in character references and will surely get you into trouble if the attribute value contains a character reference.
Geert,
An alternative method that I have used was to get the values of the attributes once you are inside your XyPerl program rather than passing the attribute value via the /Pb XyPerl call.
That method suited me as I needed to look at many attributes which were only optionally going to be present.
So within my XyPerl program I used this to get access to the XML including any attributes
my $mlh = $X->get_mlh();
That seemed to give me exactly what I expected to see with regard to attribute contents.
This would avoid any import transformations which might be important if you are roundtripping your XML back out of XPP.
Note:
get_mlh() was initially escaping special characters, such as $ and @, in the string that it passed to you but this issue was addressed and solved – I believe in patch set 4 for XPP 8.1C.1
Paul,
Thanks for mentioning this. I must say I’m not familiar with the get_mlh() function, as we’re still using XPP 7.3. If I’m right, get_mlh() was introduced in XPP 8.1. I suppose it’s high time we migrate to the newest version. This certainly is another good reason to do so.