linguist icon indicating copy to clipboard operation
linguist copied to clipboard

Grammar is required for CSV syntax highlighting.

Open RokeJulianLockhart opened this issue 2 months ago • 8 comments

Language name

IETF RFC 4180 Comma-Separated Values

URL of example repository

The immediate contents of datasets/un-locode/tree/94ccba00ee41a6bb5c76d71edca246a55778c507/data are an example of CSV file content. However, although they render with an interactive table, that's not applicable to this issue.

Instead, the problem that I experience is that no syntax highlighting exists for CommonMark <pre><code>s defined with ~~~CSV, which causes github/markup/issues/1851. ^1

URL of syntax highlighting grammar

  1. BdR76/CSVLint/blob/cee5ff0f6dfaedd05c379a41e842e2eb33b7e25c/readme.md

  2. invent.kde.org/frameworks/syntax-highlighting/-/merge_requests/387

  3. gist.github.com/jakergrossman/cf026270733d68d8abf5c683187e5e8a

Most popular extensions

.CSV

Detected language

Currently, text/plain.

RokeJulianLockhart avatar Oct 10 '25 14:10 RokeJulianLockhart

#issuecomment-3390479052

@Trizna228, you're obviously an LLM, so I've reported you at support.github.com/ticket/personal/0/3827590.

RokeJulianLockhart avatar Oct 10 '25 14:10 RokeJulianLockhart

Linguist already supports CSV… it just doesn't have a grammar, hence no syntax highlighting:

https://github.com/github-linguist/linguist/blob/e82dba118ba3d3ee715f95d5aada4f14a6f784ac/lib/linguist/languages.yml#L982-L989

A better title for this PR is something like "Grammar required for CSV for syntax highlighting".

If you know of a TextMate compatible grammar for CSV, you add the grammar yourself by following step 2 of adding a new language in the CONTRIBUTING.md file and submit a PR.

lildude avatar Oct 10 '25 14:10 lildude

a TextMate compatible grammar for CSV

@lildude, what microsoft/vscode/issues/68647#issuecomment-702506660 cites ("Rainbow CSV") might be relevant. Otherwise, I fear I might not be of much use. I certainly doubt I can submit a useful PR. Apologies.

RokeJulianLockhart avatar Oct 10 '25 14:10 RokeJulianLockhart

@lildude, what microsoft/vscode/issues/68647 cites might be relevant.

No. I don't think it's relevant. If you want syntax highlighting of CSV files in the files themselves (when viewing as code) and codeblocks on GitHub, all you need to do is add a grammar to Linguist and both locations will pick it up.

I think the biggest challenge is going to be finding such a grammar.

lildude avatar Oct 10 '25 15:10 lildude

I think the biggest challenge is going to be finding such a grammar.

I'll go ahead and write one, as I've been meaning to, anyway. The Generic Database grammar that's used to highlight .tsv files is too, uhm, generic.

Alhadis avatar Oct 10 '25 16:10 Alhadis

vscode_rainbow_csv has syntaxes for csv https://github.com/mechatroner/vscode_rainbow_csv/tree/master/syntaxes

tho you're gonna run into the exact same issues that they/GitHub have how to detect the delimiter correctly? and detect strings correctly?

"double quoted"
'single quoted'
"double "" escapes"
"backslash \" escapes"

currently GitHub doesn't support sep=

RedCMD avatar Oct 11 '25 21:10 RedCMD

@RedCMD, RFC 4180 solely appears to permit U+002C as the record separator, and makes no distinction between object types. Additionally, it does not permit backslash escapes; solely double-quotation escapes.

RokeJulianLockhart avatar Oct 11 '25 22:10 RokeJulianLockhart

I've created the language-dsv repository within the last 24 hours that contains grammars for both CSV and TSV files, and will be following up shortly with a pull-request.

UPDATE: Okay, having some trouble hacking together a workaround for \A escapes that aren't matched inside embedded markdown code-blocks, which is necessary for highlighting the column-header row only. Stand by…

how to detect the delimiter correctly? and detect strings correctly?

@RedCM Simply put, you can't. Ergo, it's less problematic to assume the most common use-case than to attempt to support less conventional scenarios (such as header-less data files, or single-quoted CSV fields). This is something that could be addressed dynamically from within an editor that uses these grammars (by leveraging editor-specific APIs), but given the static, unconfigurable nature of GitHub's syntax highlighting, this is a moot point.

Alhadis avatar Oct 21 '25 11:10 Alhadis