javabot icon indicating copy to clipboard operation
javabot copied to clipboard

Links in chat logs seem to be double-HTML-encoded

Open Two4 opened this issue 2 years ago • 4 comments

Text in chat messages seem to be parsed for links and enriched with <a> elements, as well as HTML encoding applied for certain text literals (e.g., &gt; for >) - however, when the logs are rendered on https://javabot.evanchooly.com, that HTML-enriched and encoded text seems to be HTML-encoded again, nullifying the original HTML encoding (HTML highlighting applied for clarity):

<a href="https://www.baeldung.com/maven-dependencymanagement-vs-dependencies-tags" target="_blank">https://www.baeldung.com/maven-dependencymanagement-vs-dependencies-tags</a>

Page is viewed with Firefox 117.0.1 (64-bit) on Windows 10

Two4 avatar Sep 15 '23 14:09 Two4

Example: https://javabot.evanchooly.com/logs/%23java/2023-09-15#6503d84dbc95240fc24b568b

Two4 avatar Sep 15 '23 14:09 Two4

I think we have competing requirements. The logs should not be exposing any active links. So the escaping is the proper thing and the code that tries to htmlify the urls should be removed.

evanchooly avatar Sep 15 '23 17:09 evanchooly

fair, I agree, it shouldn't try to enrich URIs with anchor tags - but there's still going to be double encoding. Either the encoding on the data ingress (bot -> db) or egress (db -> website) must be removed. Personally, I'd remove the encoding on the ingress, so the logs can be used in their raw form for more than website purposes. I assume there's SQL sanitisation (sanitization) already

Two4 avatar Sep 18 '23 07:09 Two4

I agree. The logs should be whatever comes in. We should escape any html on rendering though so we don't run in to any XSS issues.

evanchooly avatar Sep 18 '23 21:09 evanchooly