siyuan icon indicating copy to clipboard operation
siyuan copied to clipboard

Adds pygments library for html formatting of the plain text result preview

Open dubrousky opened this issue 10 years ago • 3 comments

Pygments is a python source code highlighter http://pygments.org/docs/ . The idea is to use highlighting tools when previewing the plain text search results. The changes added perform the guess on the source code language based on the previewed contents and if match found - highlight the preview as html page with formattng and line numbers. If there is no pygments installed in the system the code falls back to the plain text.

dubrousky avatar Dec 29 '13 18:12 dubrousky

Hello. I have to say: a cracking idea! I briefly tested the patch (had to change an else if to elif in one place for it to work) and source code files, manpages, etc do look awesome.

There is a slight problem however: guess_lexer() seems pretty liberal when it comes to deciding how to handle stuff. It results in spurious formatting of emails and other plain text. To give an example: words like 'for' or 'else' get highlighted and font colours get confused when apostrophes get taken for quotation marks. Addition of line numbers for emails and such is pretty questionable too.

I had a quick look at pygments' docs and I'd propose using get_lexer_for_filename() instead of guess_lexer() to solve this problem but perhaps there is a more robust way around. Any ideas?

koniu avatar Dec 29 '13 21:12 koniu

Hi, as for the way to recognize the source file language - if we knew the full path to the file, we might use get_lexer_for_filename() as you mentioned - I did not look much into the recoll API. Also it is possible to configure pygments output (such as line numbers) depending on the mime type, source language, or user preferences. I just did the quick fix to get what I was missing in this tool. I would also add some options to the webui-standalone.py to set the hostname and port from the command line. I would also suggest to provide option to limit the bottle web server to serve requests only from localhost - for security reasons.

dubrousky avatar Dec 30 '13 10:12 dubrousky

You can get the full path from doc.url (the original doc, not the tdoc which holds the extracted text). You could also conceivably make use of doc.mimetype. This can have 2 origins:

  • For files with extensions in /usr/share/recoll/examples/mimemap, the mime type comes from there.
  • For other files, the mime type comes from "file -i"

So there is a slight risk of unstability and system/version dependances with the mime type, and maybe you're better off using the extension if there is one.

ghost avatar Dec 30 '13 16:12 ghost