dwc-qa
dwc-qa copied to clipboard
Pipe separators are interfering with Markdown-based feedback loops
An observation.
We're starting to work with aggregated reports on data submitted to GBIF.
- TaxonWorks uses pipes (
|
) to delimit multiple values, as exemplified in many examples in the term standard. - Reports are coming to use that also use Pipes (e.g. copy-past of sql dumps).
- Github permits tables in Markdown ... that use pipes
If we want to clean up reporting "formatting", to better round-trip feedback, then Markdown might be useful as an intermediate format for exchanging issues. However, when we want to include data values in those reports, and those values contain pipes, then we have rendering issues. Obviously we can escape pipes, but this requires another layer of handling.
I'm wondering 2 things:
- Should we move away from suggesting pipes as delimiters?
- Why doesn't TDWG simply require a specific (non-pipe) delimiter when defining multiple values per term? Surely this character-based standard would greatly increase data interoperability.
@mjy I cross posted this issue in the TAG repo fir the next meeting. https://github.com/tdwg/tag/issues/47
2. Why doesn't TDWG simply require a specific (non-pipe) delimiter when defining multiple values per term? Surely this character-based standard would _greatly_ increase data interoperability.
Struggling with pipe characters too. (2) would be my preferred solution.
@mjy @tucotuco @timrobertson100 Tim or John please correct me if I'm wrong. It is my understanding that Option 2 was the original directive. Many delimiters can be exceedingly problematic, commas especially. If you break down all of the possible common delimiters, pipes are arguably the least commonly used characters in string values. Hence, the current suggestion.
pipes are arguably the least commonly used characters in string values. Hence, the current suggestion.
That is exactly right. A change in that recommendation would have immense repercussions that I would be loathe to face without a proven better alternative.
I think Markdown is an inappropriate format for sharing data, so I suggest escaping the characters or using HTML (<td>value | value</td>
) which is also valid Markdown — though you'll then need to escape <
and &
.