text-vcard
text-vcard copied to clipboard
Hardcoded "\x0D\x0A" fails for parsing vCard on Windows
I was attempt to help a user on SO: http://stackoverflow.com/q/22790520/1733163
However, I was unable to install this module and it's dependencies without errors on Strawberry Perl 5.18.2. In the end I forced install in order to debug the module.
I traced one issue to Line 315 of Text::vCard::Addressbook my @lines = split "\x0D\x0A", $text;
The above will file to split a file on Windows because \x0D is stripped by perlio in :crlf mode. The only way to bypass this issue is to open import the file in binmode and pass it to he constructor as a string. The following straights:
use strict;
use warnings;
use utf8;
use vCard;
# create the object
my $vcard = vCard->new;
binmode(\*DATA);
my $string = do { local $/; <DATA> };
$vcard->load_string($string);
print $vcard->as_string();
__DATA__
BEGIN:VCARD
VERSION:3.0
N;LANGUAGE=de:name;first
FN:first name
ORG:company
TEL;TYPE=work:+49 (0000) 123456
ADR;TYPE=work:;;Strasse 1;Ortschaft arbeit;;56789;Deutschland
ADR;TYPE=home;PREF:;;Strasse 2;Ortschaft 4;;12345;Deutschland
EMAIL:[email protected]
URL:www.a.de
END:VCARD
Without binmode, the output is:
BEGIN VCARD without matching END at C:/strawberry/perl/site/lib/Text/vFile/asData.pm line 137,
With binmode, the output is:
BEGIN:VCARD
VERSION:4.0
END:VCARD
Obviously there are still more issues to resolve, but I at least wanted to submit this. If I have time to follow up with this issue, or even create a full patch, I will submit those later.
Hi @tmhall! Thanks for reporting this. It makes me aware of a few issues.
The vcard RFC requires every line to end with a \r\n. We should have a more helpful error if the string is not properly formatted. Do you still have a problem if you use $vcard->load_file()
?
Also load_string()
does not do what the documentation says it does. Here is code that should behave the way you want:
my $new_vcard = $vcard->load_string($string);
$new_vcard->as_string();
This is not intuitive behavior and someone should look into fixing that.
Ok, I submitted 2 pull requests which should address a few of the issues here. I still need to create a better error message for strings that are missing the \r\n pattern. But I'm heading off to sleep now.
@kablamo boom: http://www.cpantesters.org/cpan/report/6e1d7369-6c10-1014-b8e4-5db800d6f3f7
@kablamo less boom, but still boom http://www.cpantesters.org/cpan/report/1d3fc6ca-6bfa-1014-9538-937384992293
@kablamo different boom! http://www.cpantesters.org/cpan/report/d9bfc4e4-7489-1014-b7e9-e9c63567e6ce
@kablamo Getting there! - but now UTF8 issue for Windoz: http://www.cpantesters.org/cpan/report/33e1ea25-6bfa-1014-afed-754eb9e98ec4
The hardcoded CRLF problem is not limited to Windows. I just encountered it with vcards downloaded from an official phone directory.
I would suggest to follow Jon Postel's "Robustness Principle" from RFC 761:
be conservative in what you do, be liberal in what you accept
and accept both CRLF and LF line endings.
My quick fix was this, in case someone finds it useful:
$ diff -u /usr/share/perl5/Text/vFile/asData.pm.orig /usr/share/perl5/Text/vFile/asData.pm
--- /usr/share/perl5/Text/vFile/asData.pm.orig 2013-01-28 13:32:13.000000000 +0100
+++ /usr/share/perl5/Text/vFile/asData.pm 2020-09-30 16:15:51.850221688 +0200
@@ -27,9 +27,9 @@
sub _unwrap_lines {
my $self = shift;
my @lines;
- for (@_) {
+ for (map {split /\R/} @_) {
my $line = $_; # $_ may be readonly
- $line =~ s{[\r\n]+$}{}; # lines SHOULD end CRLF
+ $line =~ s{\R+$}{}; # lines SHOULD end CRLF (but let's not be so strict)
if ($line =~ /^[ \t](.*)/) { # Continuation line (RFC Sect. 4.1)
die "Continuation line, but no preceding line" unless @lines;
$lines[-1] .= $1;