nextclade icon indicating copy to clipboard operation
nextclade copied to clipboard

nextclade run --output-csv is semicolon separated, not comma

Open tseemann opened this issue 3 years ago • 9 comments

--output-csv seems to be producing ;-sep not ,-sep The --output-tsv is fine.

% cd dattaset-dir

% nextclade run --input-fasta seqiences.fasta ---dataset-dir . --output-csv out.csv

% head out.csv
seqName;clade;Nextclade_pango;qc.overallScore;qc.overallStatus;totalSubstitutions;totalDeletions;totalInsertions;totalFrameShifts;totalAminoacidSubstitutions;totalAminoacidDeletions;totalAmi <snip>

tseemann avatar Apr 12 '22 22:04 tseemann

Thanks for opening this issue.

This is on purpose: https://github.com/nextstrain/nextclade/blob/4f236b58277bdaef72e2f84d26207dbbbc0b8502/packages/web/src/state/algorithm/algorithmExport.sagas.ts#L38

I don't know what the reasoning was, but there's one. @ivan-aksamentov can tell probably

We should probably document this better, although it's easy to notice by inspection.

corneliusroemer avatar Apr 13 '22 15:04 corneliusroemer

@tseemann This is by design. Have you encountered any particular problems with that?

ivan-aksamentov avatar Apr 13 '22 17:04 ivan-aksamentov

The problem is that it is not documented in the --help and the option is called --output-csv :-)

A note int he --help text next to the option would resolve the confusion.

tseemann avatar Apr 15 '22 02:04 tseemann

I think we did this because many fields contain lists, which themselves are comma separated. "..." of course solves this, but using a different delimiter seemed more robust. Should definitely documented though.

rneher avatar May 05 '22 15:05 rneher

Has this behaviour been changed in Nexclade 2.x ?

tseemann avatar Jul 12 '22 00:07 tseemann

@tseemann No, at least not intentionally. What have you found?

P.S. Added a note in https://github.com/nextstrain/nextclade/pull/933

ivan-aksamentov avatar Jul 12 '22 10:07 ivan-aksamentov

It is still ; separated, despite being called --output-csv :-)

Best to leave it for backward compat but maybe put a note in the --help for it?

( I will just keep using --output-tsv and converting it with csvtk tab2csv )

tseemann avatar Jul 14 '22 23:07 tseemann

When you convert from tsv to csv you need to quote surround certain columns that contain , - otherwise it becomes unparseable.

We could have an option to output a comma separated CSV, but that would require aforementioned quote surrounding.

Can you elucidate the use case in which ; is unacceptable as separator? Is it to match expectations? Or is there a technical reason?

All decent software to open CSV should allow specification of the separator. After all, German csvs will use ; as separators anyways. We could basically say we produce German CSV ;)

corneliusroemer avatar Jul 15 '22 09:07 corneliusroemer

@tseemann --output-csv output uses ; for a long time now, since very early days of Nextclade 0.x, when someone requested it. In Nextstrain we traditionally use mostly TSV tables everywhere, but someone felt they need a CSV, and requested a feature, so we added it. At the time, due to defects in the early Nextclade implementation, it was considered difficult to implement comma-separated rows, because commas were already used in the values, so we went with a simple solution of using semicolons. The person requested it was fine with this. And so modern Nextclade had to inherit all that.

Most spreadsheet software (e.g. MS Office, Open Office etc.) and libraries (e.g. Pandas, etc.) should automatically recognize semicolon delimiter, or at least there is often an option to switch to it, and, in my experience, it is quite widespread in the files on the internet. In all places I've seen it's always called CSV, even if the delimiter is not comma. No one calls it SSV or anything other than CSV.

So we consider this normal, and right now there is no intention to change it - this would be an unnecessary breaking change.

I mentioned above that in PR https://github.com/nextstrain/nextclade/pull/933 I added a note to the help message text. It was released in 2.3.0.

Let us know if you encounter any additional problems or have concrete improvement ideas in mind.

ivan-aksamentov avatar Jul 15 '22 10:07 ivan-aksamentov