DelimitedFiles.jl icon indicating copy to clipboard operation
DelimitedFiles.jl copied to clipboard

document that {read,write}dlm follow RFC 4180

Open zarakay opened this issue 9 years ago • 8 comments

So from my understanding, the writedlm function is used to write arrays to file. When writing an array with strings that have an escaped double quotation mark, unexpected output occurs. This can be reproduced in the REPL

Command To Reproduce

writedlm(STDOUT, ["\"Hello World\""])

Output

"""Hello World"""

Expected Output

""Hello World""

I could not find anything in the documentation that would explain why this would happen, and it seems fine for single quotation marks and any other escaped character

Version Info

Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

zarakay avatar Dec 13 '16 08:12 zarakay

DLM isn't a real format, and CSV isn't a single standard, but... this seems to follow the common rules for CSV, which mean that " characters inside of double-quoted fields are represented by "", which is what's happening here. And, of course, dlmwrite delenda est.

StefanKarpinski avatar Dec 13 '16 23:12 StefanKarpinski

Hiya, thanks for the reply

Firstly having your write out DLM in all caps suddenly makes more sense, I had been thinking for a while that the function had a weird name...

Secondly I had no idea what the CSV "standard" did that to double quotation marks, I had just assumed it escaped it.

Seeing that this is not a bug, just a misunderstanding of how double quotation marks are represented in CSV, I am going to go ahead and close the issue.

Thanks for your help

zarakay avatar Dec 13 '16 23:12 zarakay

No worries – CSV is a confusing and generally terrible format, but it's better all the other ones we have for text-based tabular data.

StefanKarpinski avatar Dec 13 '16 23:12 StefanKarpinski

It could be useful to mention in the docstring that we take RFC 4180 as the reference for the format.

nalimilan avatar Dec 14 '16 09:12 nalimilan

Updated as a doc issues. (readdlm delenda est.)

StefanKarpinski avatar Dec 14 '16 19:12 StefanKarpinski

And as a doc issue, it should be reopened, I guess...

martinholters avatar Dec 14 '16 20:12 martinholters

Given that CSV.jl is now a separate package, do we still need to mention RFC 4180 in readdlm?

ViralBShah avatar Mar 13 '22 14:03 ViralBShah

As long as the function exists I guess it's worth documenting what it does.

nalimilan avatar Mar 13 '22 16:03 nalimilan

For some reason, I can't transfer this issue to DelimitedFiles.jl

ViralBShah avatar Apr 08 '22 14:04 ViralBShah

I was able to

https://github.com/JuliaLang/DelimitedFiles.jl/issues/17

DilumAluthge avatar Apr 08 '22 14:04 DilumAluthge

@ViralBShah Something has gone horribly wrong. There are now like 5+ copies of this issue on DelimitedFiles.jl.

DilumAluthge avatar Apr 08 '22 14:04 DilumAluthge

Yeah I kept thinking I did it and it wouldn't happen. And it still exists here.

ViralBShah avatar Apr 08 '22 14:04 ViralBShah

Deleted the others. Let's see. It still says transfer in progress.

ViralBShah avatar Apr 08 '22 14:04 ViralBShah