Grammar is required for CSV syntax highlighting.
Language name
IETF RFC 4180 Comma-Separated Values
URL of example repository
The immediate contents of datasets/un-locode/tree/94ccba00ee41a6bb5c76d71edca246a55778c507/data are an example of CSV file content. However, although they render with an interactive table, that's not applicable to this issue.
Instead, the problem that I experience is that no syntax highlighting exists for CommonMark <pre><code>s defined with ~~~CSV, which causes github/markup/issues/1851. ^1
URL of syntax highlighting grammar
-
BdR76/CSVLint/blob/cee5ff0f6dfaedd05c379a41e842e2eb33b7e25c/readme.md -
invent.kde.org/frameworks/syntax-highlighting/-/merge_requests/387 -
gist.github.com/jakergrossman/cf026270733d68d8abf5c683187e5e8a
Most popular extensions
.CSV
Detected language
Currently, text/plain.
@Trizna228, you're obviously an LLM, so I've reported you at support.github.com/ticket/personal/0/3827590.
Linguist already supports CSV… it just doesn't have a grammar, hence no syntax highlighting:
https://github.com/github-linguist/linguist/blob/e82dba118ba3d3ee715f95d5aada4f14a6f784ac/lib/linguist/languages.yml#L982-L989
A better title for this PR is something like "Grammar required for CSV for syntax highlighting".
If you know of a TextMate compatible grammar for CSV, you add the grammar yourself by following step 2 of adding a new language in the CONTRIBUTING.md file and submit a PR.
a TextMate compatible grammar for CSV
@lildude, what microsoft/vscode/issues/68647#issuecomment-702506660 cites ("Rainbow CSV") might be relevant. Otherwise, I fear I might not be of much use. I certainly doubt I can submit a useful PR. Apologies.
@lildude, what
microsoft/vscode/issues/68647cites might be relevant.
No. I don't think it's relevant. If you want syntax highlighting of CSV files in the files themselves (when viewing as code) and codeblocks on GitHub, all you need to do is add a grammar to Linguist and both locations will pick it up.
I think the biggest challenge is going to be finding such a grammar.
I think the biggest challenge is going to be finding such a grammar.
I'll go ahead and write one, as I've been meaning to, anyway. The Generic Database grammar that's used to highlight .tsv files is too, uhm, generic.
vscode_rainbow_csv has syntaxes for csv https://github.com/mechatroner/vscode_rainbow_csv/tree/master/syntaxes
tho you're gonna run into the exact same issues that they/GitHub have how to detect the delimiter correctly? and detect strings correctly?
"double quoted"
'single quoted'
"double "" escapes"
"backslash \" escapes"
currently GitHub doesn't support sep=
@RedCMD, RFC 4180 solely appears to permit U+002C as the record separator, and makes no distinction between object types. Additionally, it does not permit backslash escapes; solely double-quotation escapes.
I've created the language-dsv repository within the last 24 hours that contains grammars for both CSV and TSV files, and will be following up shortly with a pull-request.
UPDATE: Okay, having some trouble hacking together a workaround for \A escapes that aren't matched inside embedded markdown code-blocks, which is necessary for highlighting the column-header row only. Stand by…
how to detect the delimiter correctly? and detect strings correctly?
@RedCM Simply put, you can't. Ergo, it's less problematic to assume the most common use-case than to attempt to support less conventional scenarios (such as header-less data files, or single-quoted CSV fields). This is something that could be addressed dynamically from within an editor that uses these grammars (by leveraging editor-specific APIs), but given the static, unconfigurable nature of GitHub's syntax highlighting, this is a moot point.