SandDance Feature request : handling of comments

Hi,

I gave a try to SandDance today but I noticed it doesn't support .csv with comments. It could be great to have it work seamlessly ! :)

Here is the error message that is displayed when opening such a .csv file (first 3 lines begin with a comment line) Capture d’écran, le 2020-06-22 à 15 04 29

Jun 22 '20 19:06 sebastienwood

Hello, can you give an example of the comment line? I'd like to know what is the comment character.

Jun 22 '20 19:06 danmarshall

I make use of an #, I understood it is the de-facto comment character for .csv files

Jun 22 '20 22:06 sebastienwood

According to https://stackoverflow.com/questions/1961006/can-a-csv-file-have-a-comment I'm not sure if this is completely standardized. Can you give any examples where this is documented? Or is it standardized in a certain user community?

Jun 22 '20 22:06 danmarshall

Indeed ! I found the same link, which lead me to using #. As stated in the second top voted answer in the link, it seems the # is prevalent in the data engineering community.

I found at least a couple of stackoverflow threads that lead to the same answer. Notably this one leads a W3C recommandation in the answers that suggest using # for comments in .csv files.

Jun 22 '20 23:06 sebastienwood

Thanks for the link @sebastienwood . I might push this issue down the stack, since we use Vega for parsing the csv. I will create an issue there and link to it from here.

Jun 22 '20 23:06 danmarshall

https://github.com/vega/vega/issues/2729

Jun 22 '20 23:06 danmarshall

Hi @sebastienwood , The Vega team has decided not to add this feature to their package. However, we might still be able to add it here if it makes sense. Can you describe the workflow in which you get data with comments? I'm asking because it may make sense to add it to a specific integration, for example the VsCode extension or the Jupyter Widget.

Jun 25 '20 20:06 danmarshall

For sure ! My workflow is roughly as follow : I have a database of experiments that have been logged in a separate process. The goal is to filter experiments by hyperparameters, then generate plot-ready .csv of said experiments.

As the .csv will have a life of their own after being generated, a preamble-like comment is prepended which sums up the filters that generated this .csv: this helps with bookkeeping and traceability. After being generated, the .csv is displayed either in SandDance or in any plot generation utility for reporting/analysis purposes.

All this happens in VSCode, but the same could apply to a Jupyter context. The current way I bypass the issue is to generate a truncated .csv which removes the preamble to ensure compatibility.

Let me know if I have omitted any thing of interest.

Jun 26 '20 16:06 sebastienwood

Thanks @sebastienwood !

Jun 26 '20 17:06 danmarshall

Hi, are you still working on this or is it planned to implement this feature in the future? I have a very similar workflow where I want to skip the first X lines of my csv file as they describe my hyperparameters. It would be no problem to add a # in front of these lines, but having some kind of that feature would be really nice!

Aug 29 '23 11:08 RisingPhoelix

I'm comfortable with the # approach since it appears to be common practice. Following https://www.w3.org/TR/tabular-data-model/#embedded-metadata

I'm happy to take a PR :)

Aug 30 '23 17:08 danmarshall

SandDance SandDance copied to clipboard

Feature request : handling of comments

SandDance
SandDance copied to clipboard