Links in chat logs seem to be double-HTML-encoded
Text in chat messages seem to be parsed for links and enriched with <a> elements, as well as HTML encoding applied for certain text literals (e.g., > for >) - however, when the logs are rendered on https://javabot.evanchooly.com, that HTML-enriched and encoded text seems to be HTML-encoded again, nullifying the original HTML encoding (HTML highlighting applied for clarity):
<a href="https://www.baeldung.com/maven-dependencymanagement-vs-dependencies-tags" target="_blank">https://www.baeldung.com/maven-dependencymanagement-vs-dependencies-tags</a>
Page is viewed with Firefox 117.0.1 (64-bit) on Windows 10
Example: https://javabot.evanchooly.com/logs/%23java/2023-09-15#6503d84dbc95240fc24b568b
I think we have competing requirements. The logs should not be exposing any active links. So the escaping is the proper thing and the code that tries to htmlify the urls should be removed.
fair, I agree, it shouldn't try to enrich URIs with anchor tags - but there's still going to be double encoding. Either the encoding on the data ingress (bot -> db) or egress (db -> website) must be removed. Personally, I'd remove the encoding on the ingress, so the logs can be used in their raw form for more than website purposes. I assume there's SQL sanitisation (sanitization) already
I agree. The logs should be whatever comes in. We should escape any html on rendering though so we don't run in to any XSS issues.