TorPaste icon indicating copy to clipboard operation
TorPaste copied to clipboard

Syntax Highlighting

Open daknob opened this issue 9 years ago • 8 comments

Any Paste Service can benefit from Syntax Highlighting. With a quick search, a lot of syntax highlighters require JavaScript to work. Since we want to avoid JS, we have to find a backend-based syntax highlighter, which means it must be written in Python. Let's try and find one and see how well it can work with the existing code.

daknob avatar Sep 26 '16 08:09 daknob

Something to note in Syntax Highlighting is that if the user is able to select the type of code they want to enable syntax highlighting for, then this causes issues with the Advanced Paste Deduplication Mechanism™. If two users paste the same code, and one marks it as C, while the other as C++, currently only the last option will be valid.

To address this issue, TorPaste can either automatically detect the language used, or, when multiple languages are set, allow the viewer to select which language they want to enable highlighting for, in a dropdown fashion.

daknob avatar Sep 28 '16 09:09 daknob

Of course, there's also the option to change the Advanced Paste Deduplication Mechanism™ to use random IDs..

daknob avatar Sep 28 '16 09:09 daknob

Another solution is to request Syntax Highlighting when viewing the paste, with a route like `/view/<paste_id>/.

The /new logic would be:

  1. Create a paste and indicate a default language, say Python;
  2. This creates the paste as /pastes/${paste_id} with the non-highlighted content, and redirect the user to /view/${paste_id}/python

The /view/<paste_id>/<syntax> logic would be:

  1. check the existence of /pastes/${paste_id}.${syntax};
  2. if it does not exist, create it by using the syntax highlighter
  3. display the content of /pastes/${paste_id}.${syntax}

As you can see, in that example, the python-highlighted file is created immediately, because the paste's author is redirected to it. But the viewing screen would allow another language to be requested, and this would just create a new highlighted version of the same paste (the python file would remain, and a new C++ one would be created for instance). Also, with that logic, creating another paste with the same content but another language, say PHP, would just cause the creation of /pastes/${paste_id}.php, but not delete the python-highlighted version.

The /view/<paste_id> with no syntax specified would keep the current behavior (no syntax highlighting).

This would look like:

@app.route("/view/<pasteid>/<syntax>")
@app.route("/view/<pasteid>")
def view_paste(pasteid, syntax=None):
    # [...]

j11e avatar Oct 10 '16 21:10 j11e

This seems like a good idea. We can add a /view/<pasteid>/<syntax> in the torpaste.py app and then show the paste content in the desired syntax. If the uploader uses python and I want to view the content as c, I should be able to do that. However maybe we should not store the file any different: just store the paste normally, and then also store the user selected syntax as metadata (or maybe not at all). As long as the user distributed the link with the /python, then everyone who reads it can see the python syntax highlighting. If a user makes the same paste and chooses c, then they will get a /c link, but they will not be able to cause issues to the /python link at all! Also, when viewing the paste, we can add a dropdown with all supported languages so far and a button to allow the user to view it in this format.

And just to make our lives easier, we can do /view/<pasteid>?syntax=python so that <form> there can be:

<form action="/view/<pasteid>" method="get">
    <select name="syntax">
        <option value="c">C</option>
    </select>
    <button>Submit</button>
</form>

daknob avatar Oct 12 '16 07:10 daknob

(However the above proposal means that we need to do the syntax highlighting dynamically every time a paste is requested and I do not know if this will be time/computationally intensive)

daknob avatar Oct 12 '16 07:10 daknob

I agree with what you said, but I do have a question: what do we prefer between:

  1. syntax highlighting recomputed at each view, as you propose
  2. syntax highlighting computed the first time a language is requested, then remembered, as I proposed before

1 takes more CPU, 2 takes more storage. I personally prefer 2, as it costs less in terms of performance, and storage is cheap anyway (after all, this is just text).

j11e avatar Oct 12 '16 15:10 j11e

I don't think it's easy to make this decision because it depends on our syntax highlighter. How much does it take to calculate the result for a specific size of input? How much CPU does it need? If it needs 5 seconds for a 1 kB file, then we have to store it. If it needs 2 ms and takes 0.1% of the CPU then I think we can safely calculate it on the fly.

Here's an attack for the storage option: View every listed paste in every available language with a bot and cause m*n where m is the number of pastes and n is the number of available languages additional storage use.

daknob avatar Oct 12 '16 17:10 daknob

A third option is to simply have a configuration variable allowing to switch between the two modes. After all, the difference is pretty small (save the highlighted paste or recompute it each time).

j11e avatar Oct 16 '16 18:10 j11e