[doc] perlpacktut
Where
https://perldoc.perl.org/perlpacktut
Description
Issue 1:
The example at the end of "The Basic Principle" packs "byte contents from a string of hexadecimal digits".
The code is pack( 'H2' x 10, 30..39 ). It is not really straightforward to see 30 as a "hexadecimal digits".
Why making it unnecessarily confusing?
The following would be easier for beginners, avoiding "misunderstanding", which is the purpose of this tutorial.
my $s = pack( 'H2' x 10, '30'..'39');
print "$s\n";
Issue 2:
Since there are unicode strings and byte strings, it is not clear what can be unpacked. It seems unpacking unicode strings may have unexpected result.
#!/usr/bin/perl -w
use v5.34;
use utf8;
use strict;
use warnings;
use Encode qw(encode decode);
my $s = "0123456789😀";
my $b = encode "UTF8", $s;
say "Unpack unicode string 1: ", unpack( '(H2)*', $s);
say "Unpack unicode string 2: ", unpack( 'H*', $s);
say "Unpack bytes: ", unpack( 'H*', $b);
{
use bytes;
say "Unpack unicode string 3: ", unpack( 'H*', $s);
}
The output is:
Character in 'H' format wrapped in unpack at .\t.pl line 11.
Unpack unicode string 1: 3031323334353637383900
Character in 'H' format wrapped in unpack at .\t.pl line 12.
Unpack unicode string 2: 3031323334353637383900
Unpack bytes: 30313233343536373839f09f9880
Unpack unicode string 3: 30313233343536373839f09f9880
Thank you. I agree with your first point, though it may be made even clearer by using strings containing hex digits A-F in the example.
For point 2, the Unicode section probably needs to be rewritten as it's overly abstraction dependent, similar to your "use bytes" example which breaks the Perl string abstraction. I'm not sure exactly what you're suggesting is the problem there otherwise.
For point 2, I would like to see some clarifications in the tutorial. I agree that some sections may "needs to be rewritten". When I read the tutorial, I had these questions.
Q1: Can a unicode string be unpacked? If it is not recommended, then the tutorial can make it clear "do not unpack unicode string".
Q2: The example in the tutorial seems to suggest that it is fine to unpack a unicode string into "strings"? If a unicode string can be unpacked in some cases, when would it work?
while (<>) {
my ($date, $desc, $income, $expend) =
unpack("A10xA27xA7xA*", $_);
$tot_income += $income;
$tot_expend += $expend;
}
It's a bit complex. The Perl string abstraction is simply a sequence of codepoints - not Unicode, nor bytes, until something interprets it as such. The 'a' and 'A' patterns for example will pass through a codepoint whether or not it fits in a byte, but other patterns like 'C' which are defined to operate on bytes have less obvious behavior (and unfortunately don't warn that you're doing something strange).
And your example has an additional complication. Unless you pass -CSD or add a decoding layer to STDIN or the files you are reading from, <> will return encoded bytes, not Unicode strings. So in that example unpack is likely receiving a byte string.
Thanks for the explanation. To summarize, a string may have a codepoint consists of more than one byte. The 'a' or A' pattern works with those codepoints while some other patterns works with bytes only.
It's more accurate to say it may have a codepoint which cannot represent a byte because it is higher than 255. What it's represented by internally is immaterial (unless using "use bytes", which is why that is problematic).
I've never fully understood pack and unpack, and I don't think now it's just me.
Looking @zhijieshi 's first example, I would think that if it were changed to
my $s = pack( 'H2' x 26, '41'..'5A' );
things would be clear. But instead this comes out
ABCDEFGHIPQRSTUVWXY`abcdef
And if we make the first value in the range into a number containing a hex-only digit, we get
my $s = pack( 'H2' x 6, '4A'..'4F' );
Argument "4A" isn't numeric in range (or flop)
So, the numbers 30..39 are interpreted as hex, but not all hex numbers can be used here.
And this is near the beginning of a tutorial, talking about beginner level stuff
I've never fully understood
packandunpack, and I don't think now it's just me.Looking @zhijieshi 's first example, I would think that if it were changed to
my $s = pack( 'H2' x 26, '41'..'5A' );things would be clear. But instead this comes out
ABCDEFGHIPQRSTUVWXY`abcdefAnd if we make the first value in the range into a number containing a hex-only digit, we get
my $s = pack( 'H2' x 6, '4A'..'4F' ); Argument "4A" isn't numeric in range (or flop)So, the numbers
30..39are interpreted as hex, but not all hex numbers can be used here.
PP pack/unpack's behavior, regarding bit vectors aka logic that says there is less than 8 bits in a PP TUI or PP wire binary "byte", is very poorly documented, I spent 2 hours figuring out how it works. And the POD examples are often creating mixed-endian base 2 TUI PP strings. The endian-ness inside 1 byte, is the opposite direction of how pack/unpack do intake and output of the bytes of a string, which is almost always left to right, low index to high index.