jackson-dataformat-csv icon indicating copy to clipboard operation
jackson-dataformat-csv copied to clipboard

Doesn't handle whitespace outside of quotes values correctly

Open tomdz opened this issue 11 years ago • 6 comments

When parsing a CSV file like:

"foo", "bar", "baz"
"baz", "foo", "bar"

the CSV parser will get confused and give me back exactly two values:

foo

and

 bar, baz
baz, foo, bar

(note the leading space here).

According to RFC 4180, these spaces should be considered to be part of the value, e.g. it should return 'foo', ' bar',' baz', and 'baz', ' foo', ' bar'. Alternatively - maybe via a feature - it could trim the whitespace outside of quoted strings, e.g. 'foo', 'bar','baz', and 'baz', 'foo', 'bar'.

tomdz avatar Jun 04 '13 00:06 tomdz

Quick note: trimming is already supported with CsvParser.TRIM_SPACES, see: http://fasterxml.github.io/jackson-dataformat-csv/javadoc/2.2.0/com/fasterxml/jackson/dataformat/csv/CsvParser.Feature.html#TRIM_SPACES

But I'll see what's up with eating of spaces...

cowtowncoder avatar Aug 10 '13 20:08 cowtowncoder

Hmmh. I am guessing that some spaces are missing from the example, due to Markdown? If so, could you add an example that uses, say, underscores to denote where spaces are. I need to write a unit test to verify what gives, should be an easy thing to solve.

cowtowncoder avatar Aug 10 '13 20:08 cowtowncoder

Actually it looks like I can reproduce this on my own.

cowtowncoder avatar Aug 10 '13 20:08 cowtowncoder

Hmmh. Reading through RFC 4180, I do not see definition of whether spaces would be allowed in the way described, outside quotes. But I think it would make sense to handle them in intuitive way.

FWIW, enabled TRIM_SPACES should solve your specific problem I think, until I'll fix the issue for un-trimmed case.

I assume that spaces outside of quotes should be trimmed anyway; does not make sense to make to leave them.

cowtowncoder avatar Aug 10 '13 20:08 cowtowncoder

Any update on this issue?

I have same issue and even though this specific case (where delimiter is a comma) is solved by using CsvParser.TRIM_SPACES as stated above, it messes things up when input delimiter is a space. I can use two different mappers for different delimiters but then the indexes of fields change if the delimiter changes. So it'll be nice to see these spaces handled by Jackson CSV parser.

qrlodhi avatar Mar 13 '15 21:03 qrlodhi

Unfortunately no update yet. I realize this is an important feature, and hope to address it. Interesting note on spaces, thank you for mentioning this; I hadn't thought this would be commonly done.

cowtowncoder avatar Mar 13 '15 22:03 cowtowncoder