SandDance
SandDance copied to clipboard
Feature request : handling of comments
Hi,
I gave a try to SandDance today but I noticed it doesn't support .csv with comments. It could be great to have it work seamlessly ! :)
Here is the error message that is displayed when opening such a .csv file (first 3 lines begin with a comment line)
Hello, can you give an example of the comment line? I'd like to know what is the comment character.
I make use of an #, I understood it is the de-facto comment character for .csv files
According to https://stackoverflow.com/questions/1961006/can-a-csv-file-have-a-comment I'm not sure if this is completely standardized. Can you give any examples where this is documented? Or is it standardized in a certain user community?
Indeed ! I found the same link, which lead me to using #. As stated in the second top voted answer in the link, it seems the # is prevalent in the data engineering community.
I found at least a couple of stackoverflow threads that lead to the same answer. Notably this one leads a W3C recommandation in the answers that suggest using # for comments in .csv files.
Thanks for the link @sebastienwood . I might push this issue down the stack, since we use Vega for parsing the csv. I will create an issue there and link to it from here.
https://github.com/vega/vega/issues/2729
Hi @sebastienwood , The Vega team has decided not to add this feature to their package. However, we might still be able to add it here if it makes sense. Can you describe the workflow in which you get data with comments? I'm asking because it may make sense to add it to a specific integration, for example the VsCode extension or the Jupyter Widget.
For sure ! My workflow is roughly as follow : I have a database of experiments that have been logged in a separate process. The goal is to filter experiments by hyperparameters, then generate plot-ready .csv of said experiments.
As the .csv will have a life of their own after being generated, a preamble-like comment is prepended which sums up the filters that generated this .csv: this helps with bookkeeping and traceability. After being generated, the .csv is displayed either in SandDance or in any plot generation utility for reporting/analysis purposes.
All this happens in VSCode, but the same could apply to a Jupyter context. The current way I bypass the issue is to generate a truncated .csv which removes the preamble to ensure compatibility.
Let me know if I have omitted any thing of interest.
Thanks @sebastienwood !
Hi, are you still working on this or is it planned to implement this feature in the future? I have a very similar workflow where I want to skip the first X lines of my csv file as they describe my hyperparameters. It would be no problem to add a # in front of these lines, but having some kind of that feature would be really nice!
I'm comfortable with the # approach since it appears to be common practice. Following https://www.w3.org/TR/tabular-data-model/#embedded-metadata
I'm happy to take a PR :)