pandoc-csv2table
pandoc-csv2table copied to clipboard
CSV file containing empty line isn't parsed
$ cat a.md
```{.table caption="capt" source="b.csv"}
```
$ cat b.csv
foo,bar
,
foo,bar
$ pandoc --filter pandoc-csv2table a.md
<p>+-------+-------+ | foo | bar | +=======+=======+ +-------+-------+ | foo | bar | +-------+-------+</p>
<p>Table: capt</p>
Your CSV is invalid. It should be like this:
foo,bar
foo,bar
Well, CSV isn't a particularly well-defined format. But every spreadsheet software I know of would parse my csv file as one containing an empty line (in fact, it was generated by google sheets). So I would expect:
| foo | bar |
|------|-----|
| | |
| foo | bar |
Actually, it's not even Text.CSV, in ghci:
Prelude Text.CSV> parseCSVFromFile "b.csv"
Right [["foo","bar"],["",""],["foo","bar"],[""]]
I wonder why the filter decides to print the rendered table as markdown wrapped in a paragraph of all things... you wrote earlier "as an intermediate step it pipes the CSV contents through Pandoc's Markdown Reader." I still don't understand that design decision: why not convert the list of lists we got from Text.CSV directly to a Text.Pandoc.Definition.Table?
Then this seems like a csv parser issue. The filter uses an external csv parser which implements csv parsing as defined in RFC 4180.
See my updated message above. Also, I was curious, so I checked out the RFC's BNF grammar. It defines a record as one or more comma-separated fields, and a field as escaped or non-escaped where non-escaped is zero or more TEXTDATA, so the file is valid...
I still don't understand that design decision: why not convert the list of lists we got from Text.CSV directly to a Text.Pandoc.Definition.Table?
Because pandoc tables allow markdown inside their cells.
I'll have to see where the filter is going wrong but don't hold your breath in the meantime. I am finalizing my dissertation and it might be a while before I look into it.
Also, pull requests are welcomed and appreciated.
okay :)