Text-CSV
Text-CSV copied to clipboard
FR: Reading multiple CSVs from a single file
Occasionally I have to process files that contain multiple CSVs. Each of these CSVs is stored in the file as a heading, followed by the data lines, followed by an empty line (or eof).
It would be nice if Text::CSV had an option that basically says: stop reading after an empty line. This would make it possible to write something similar to:
csv (in => $fh, out => \@aoh1, stop_at_empty => 1);
csv (in => $fh, out => \@aoh2, stop_at_empty => 1);
csv (in => $fh, out => \@aoh3, stop_at_empty => 1);
eval { csv (in => $fh, out => \@aoh, bom => 1, strict => 1) } might be a workable alternative. As for the name of this new feature, empty_row_is_eof sounds more logical. Alternatively I can imagine new values for the existing skip_empty_rows where 1 is skip and 2 is stop (eof).
One could even go further and implement a callback for this attribute:
empty_row => undef, # identical to skip_empty_rows = 0: default
empty_row => "skip", # identical to skip_empty_rows = 1
empty_row => "eof", # stop parsing, no error
empty_row => "fail", # stop parsing, FAIL
empty_row => \&foo, # call foo on empty_rows (see on_in)
Allow a callback for in that returns the next line from the input file, or undef on eof?
A callbach for in already exists, but that is defined (as \@foo) to be expected to return an arrayref. I just note this is not completely documented, but used in this example code: https://github.com/Tux/Text-CSV_XS/blob/master/doc/CSV_XS.md#dumping-database-tables-to-csv
# using the csv function, streaming with callbacks
my $sth = $dbh->prepare ($sql); $sth->execute;
csv (out => "foo.csv", in => sub { $sth->fetch });
csv (out => "foo.csv", in => sub { $sth->fetchrow_hashref });
skip_empty_rows => 0, or undef, # identical to skip_empty_rows = 0: default
skip_empty_rows => 1 or "skip", # identical to skip_empty_rows = 1
skip_empty_rows => "eof", # stop parsing, no error
skip_empty_rows => "fail", # stop parsing, FAIL
skip_empty_rows => \&foo, # call foo on empty_rows (see on_in)