sequenceserver HTML code in fasta description

HTML code in fasta description

Open tomas-pluskal opened this issue 5 years ago • 9 comments

Hi,

I noticed that the beta versions of 1.1.0 changed the way HTML tags are rendered in FASTA file description fields.

In version 1.0.9, HTML entities were interpreted as HTML code, which allowed us to place things like links (<a href=...) or formatted annotations into the FASTA files that are loaded to sequenceserver. However, in version 1.1.0 all HTML tags are shown as plan text instead (I assume the < and > characters are translated to their corresponding HTML entities < and >).

Is this a desired change? In my view, it actually limits the scope of sequenceserver - I thought being able to add HTML formatting to the FASTA descriptions was quite useful.

Oct 06 '18 15:10 tomas-pluskal

To minimize security risks, not interpreting any unknown HTML is the right thing to do by default. Any HTML snippet not already defined in the software's source code is unknown HTML, i.e., any user input. So the new behavior is indeed the preferred one.

Why not use the link generation feature to create custom links instead? http://www.sequenceserver.com/doc/#plugin

Oct 09 '18 14:10 yeban

The link generator is nice, but having some basic formatting capabilities would be nice, too (at least translating \n to <br>). Perhaps supporting something like markdown syntax for the descriptions could be nice, too?

Oct 09 '18 23:10 tomas-pluskal

Hi, I would like to return to this issue. I think having an option to add simple formatting to the sequence descriptions would be nice and useful. And using a markdown parser like kramdown (https://kramdown.gettalong.org/) this would be very easy to implement. What do you think?

Feb 17 '19 18:02 tomas-pluskal

I see the utility of it. Currently, adding custom links requires a bit of Ruby. But with embedded markdown, users can add them to the FASTA files using Perl, Python, bash, etc. Maybe embedded markdown can become the standard for adding custom links, while the link generator remains for automatic linking to public databases based on ID/title pattern. I think this feature should be opt-in (that is, disabled by default).

Mar 03 '19 14:03 yeban

I agree with the opt-in. I think it is useful not only for links, but also for highlighting stuff etc.

I can try to code this and make a pull request.

Mar 03 '19 17:03 tomas-pluskal

@photocyte any thoughts on this?

Mar 03 '19 17:03 tomas-pluskal

I think it is a good idea. Markdown formatting would support encoding of links and newlines, and the other formatting would be useful too.

Mar 03 '19 19:03 photocyte

I like these ideas as they should make it easier to customize outlinks & add complementary information.

However, it can be considered bad practice to modify a FASTA file just to add metadata, because this makes it difficult to verify its integrity in comparison with reference databases/original downloads.

So I suggest a slightly different approach:

alongside mygenome.fasta (which has been formatted into my genome.fasta.nin and all the other blast database files), optionally have a file called mygenome.fasta.links. This could be a 2-column file, where the left-most column is sequence id, and the other one includes the html or markdown).
when we display results, we

A major reason against this approach is that it doesn't piggy-back off BLAST's indexing. It is unclear to me how much of a burden (on server or on client-side) the additional RAM/time/download overhead of parsing the links files would be.

Oct 27 '21 10:10 yannickwurm

Hello, thanks for the feedback. Regarding verifying a FASTA files integrity vs original downloads: internally I've come up with a seqkit based FASTA checksum that pays attention to different levels of the sorted sequence content (e.g. all uppercase, to ignore if softmasking was performed) - in brief it looks at a FASTA file w/ 4 different levels of scrutiny w/ a standard md5sum checksum being the highest level of scrutiny & makes a 4-piece checksum (so, matching of part 1,2,3,4 vs just part 4 matching means different things). In my opinion just the file content checksum breaks too easily w/ minor modifications of the FASTA file (e.g. shortening the FASTA record names). I thought the bioinfo field should have come up with such a FASTA specific checksum but I haven't come across it... If there is interest I could try to polish the documentation & release the checksum publicly.

Regarding this case here: I still conceptually like the idea of coding metadata that could be displayed with sequenceserver, in the FASTA header, because as a general rule I like the idea of metadata being explicitly linked to files (too easy for it to get lost if in a separate file). Actually, pure Markdown doesn't encode newlines to my recollection so may not be suitable vs escaped HTML. But I think @tomas-pluskal came up with a different approach for making metadata links using sequenceserver for https://github.com/transXpress/transXpress , that I am not immediately familiar with how it was done.

Oct 27 '21 16:10 photocyte

sequenceserver sequenceserver copied to clipboard

HTML code in fasta description

sequenceserver
sequenceserver copied to clipboard