pygments icon indicating copy to clipboard operation
pygments copied to clipboard

Add lexer for CSV

Open dmctavish opened this issue 3 years ago • 6 comments

similar to text/plain

dmctavish avatar Nov 12 '21 15:11 dmctavish

@birkenfeld can you grant me access to the repo so that I can push this change? My plan is to extend the Special Lexer similar to TextLexer

_mapping.py

'CSVLexer': ('pygments.lexers.special', 'CSV', ('csv',), ('*.csv',), ('text/csv',)),

special.py all = ['CSVLexer', 'TextLexer', 'OutputLexer', 'RawTokenLexer']

class CSVLexer(Lexer): """ "Null" lexer, doesn't highlight anything. """ name = 'CSV' aliases = ['csv'] filenames = ['*.csv'] mimetypes = ['text/csv'] priority = 0.01

def get_tokens_unprocessed(self, text):
    yield 0, Text, text

def analyse_text(text):
    return CSVLexer.priority

dmctavish avatar Nov 12 '21 16:11 dmctavish

You're welcome to submit a PR.

However, the change you're suggesting here won't be accepted, since it's just a copy of the TextLexer. At the very least, CSV should make an effort to separate fields and commas, and handle quoting (althought that's already tricky since there is different quoting conventions). All in all I'm skeptical that a CSV lexer is worth it.

birkenfeld avatar Nov 12 '21 16:11 birkenfeld

@birkenfeld there is a lexer implemented here https://github.com/fish2000/pygments-csv-lexer Nice thing is it highlights each column of csv which is very useful for visualizing data. Is it possible to incorporate it as a builtin for pygments?

staticdev avatar Jun 22 '22 07:06 staticdev

Is it possible to incorporate it as a builtin for pygments?

Feel free to submit a PR :-)

A possible alternative would be a filter that automatically adds tabs or such so that the columns end up aligned.

jeanas avatar Jun 22 '22 08:06 jeanas

@Jean-Abou-Samra not sure I understand this filter. Could you give more details on how to accomplish that?

staticdev avatar Jun 22 '22 12:06 staticdev

See https://pygments.org/docs/filterdevelopment/. Basically, you could write

  • a CSV lexer that just recognizes common value formats such as numbers, and produces tokens such as Number, Text, and Punctuation, the latter used for the commas,
  • a 'CSV tabulator' filter that is fed with lexer output and inserts a TAB character after each comma (or pads with spaces?) so that the columns end up aligned.

jeanas avatar Jun 22 '22 13:06 jeanas