dwc-qa icon indicating copy to clipboard operation
dwc-qa copied to clipboard

Pipe separators are interfering with Markdown-based feedback loops

Open mjy opened this issue 9 months ago • 5 comments

An observation.

We're starting to work with aggregated reports on data submitted to GBIF.

  • TaxonWorks uses pipes (|) to delimit multiple values, as exemplified in many examples in the term standard.
  • Reports are coming to use that also use Pipes (e.g. copy-past of sql dumps).
  • Github permits tables in Markdown ... that use pipes

If we want to clean up reporting "formatting", to better round-trip feedback, then Markdown might be useful as an intermediate format for exchanging issues. However, when we want to include data values in those reports, and those values contain pipes, then we have rendering issues. Obviously we can escape pipes, but this requires another layer of handling.

I'm wondering 2 things:

  1. Should we move away from suggesting pipes as delimiters?
  2. Why doesn't TDWG simply require a specific (non-pipe) delimiter when defining multiple values per term? Surely this character-based standard would greatly increase data interoperability.

mjy avatar May 10 '24 18:05 mjy

@mjy I cross posted this issue in the TAG repo fir the next meeting. https://github.com/tdwg/tag/issues/47

ben-norton avatar May 10 '24 19:05 ben-norton

2. Why doesn't TDWG simply require a specific (non-pipe) delimiter when defining multiple values per term?  Surely this character-based standard would _greatly_ increase data interoperability.

Struggling with pipe characters too. (2) would be my preferred solution.

cboelling avatar May 13 '24 07:05 cboelling

@mjy @tucotuco @timrobertson100 Tim or John please correct me if I'm wrong. It is my understanding that Option 2 was the original directive. Many delimiters can be exceedingly problematic, commas especially. If you break down all of the possible common delimiters, pipes are arguably the least commonly used characters in string values. Hence, the current suggestion.

ben-norton avatar May 13 '24 14:05 ben-norton

pipes are arguably the least commonly used characters in string values. Hence, the current suggestion.

That is exactly right. A change in that recommendation would have immense repercussions that I would be loathe to face without a proven better alternative.

tucotuco avatar May 13 '24 14:05 tucotuco

I think Markdown is an inappropriate format for sharing data, so I suggest escaping the characters or using HTML (<td>value | value</td>) which is also valid Markdown — though you'll then need to escape < and &.

MattBlissett avatar May 13 '24 15:05 MattBlissett