strictdoc Epic: Correct escaping in HTML output

[x] https://github.com/strictdoc-project/strictdoc/pull/1929
[x] https://github.com/strictdoc-project/strictdoc/pull/1921
[x] https://github.com/strictdoc-project/strictdoc/pull/1936
[x] HTML escaping: Project tree. https://github.com/strictdoc-project/strictdoc/pull/1943
[x] HTML escaping: add itest: Traceability Matrix screen. https://github.com/strictdoc-project/strictdoc/pull/1943
[ ] HTML escaping: add itest: Search screen. https://github.com/strictdoc-project/strictdoc/pull/1954
[ ] HTML escaping: add itest: Statistics screen. https://github.com/strictdoc-project/strictdoc/pull/1954
[ ] HTML escaping: add itest: Diff/Changelog screens.
[ ] HTML escaping: add itest: HTML2PDF screen.
[ ] render_url() and friends should internally do URL escaping with something like urllib.parse.urlencode, then mark as safe.

To reproduce, generate an html export of the following document

[DOCUMENT]
TITLE: Written <b>bold</b>

[SECTION]
TITLE: Some text is <invisible>

[REQUIREMENT]
UID: REQ-1
TITLE: <script>alert(1)</script>
STATEMENT: >>>
We can also execute scripts.
<<<

[/SECTION]

I assume this is a bug, not a feature...

html.escape is already used in some places. I'll have a look and try to catch more cases.

Jul 16 '24 21:07 haxtibal

Thanks for reporting this. It is a lack of feature 😄

The html.escape was implemented for multiline fields but not for single-line. Let's implement it for single-line as well.

Jul 17 '24 06:07 stanislaw

The html.escape was implemented for multiline fields

IIUC multiline was implemented for the UI forms in #1018. Multiline in static views (like components/requirement/index.jinja) is escaped too, but for a different reason: We pipe it through docutils, which does escaping under the hood.

I thought about when and where is the best place to do the escaping:

backend.sdoc (SDocNodeField.enumerate_all_fields_escaped exists right now) is central, but IMO it's wrong because it adds HTML specific stuff to a core component
call html.escape in Jinja2 templates where needed (lots of places)
pipe through |e in Jinja2 templates (same lot of places, but keeps code cleaner)
use Jinja2 automatic escaping and opt out where it would hurt (e.g., we must not escape pre-rendered rst markup)
introduce wrapper objects that can be assigned to sdoc_entity, so that getters return escaped values

I'd probably start by trying automatic escaping and see how it works out. Do you have an opinion?

Jul 17 '24 13:07 haxtibal

backend.sdoc (SDocNodeField.enumerate_all_fields_escaped exists right now) is central, but IMO it's wrong because it adds HTML specific stuff to a core component

Sorry for the delays with my responses, I am running low on my free time.

I think I would prefer the enumerate_all_fields_escaped option for the time being to have it central like you said. Adding the HTML-specific stuff to a core component should be ok because currently there is no intermediate layer between SDoc classes and Jinja. I started introducing the view object classes to be that kind of interface to Jinja but in this specific example with escaping it will probably not help. We could revisit this later if a stronger need in an intermediate layer would emerge.

If we used enumerate_all_fields_escaped everywhere, could the change be contained in only a small number of places?

Jul 18 '24 12:07 stanislaw

backend.sdoc (SDocNodeField.enumerate_all_fields_escaped exists right now) is central, but IMO it's wrong because it adds HTML specific stuff to a core component

Sorry for the delays with my responses, I am running low on my free time.

No hurry. Same here.

I think I would prefer the enumerate_all_fields_escaped option

I did some trials with Jinja2 autoescaping already yesterday and wanted to present it first before discarding the idea. Therefore #1921. If you don't agree with the reasoning just close that MR and let's continue discussion here.

Jul 18 '24 19:07 haxtibal

@haxtibal

I have added a few TODOs here but let's consider them a very low priority. You/I don't have to rush into completing them all as long as your use case is working.

I would just like to have more test coverage to make sure we didn't miss any screens just in case. StrictDoc has many such low-priority tickets which all would love attention, so let's move according to what stands out most.

Jul 29 '24 06:07 stanislaw

It turns out that the search feature has regressed due to the autoescaping: http://127.0.0.1:5111/search?q=node.contains("TBD"). We may be missing an end-to-end test to catch this.

Oct 17 '24 13:10 stanislaw

It turns out that the search feature has regressed due to the autoescaping: http://127.0.0.1:5111/search?q=node.contains("TBD"). We may be missing an end-to-end test to catch this.

Will have a look, hopefully tomorrow.

Oct 17 '24 16:10 haxtibal

Hm, can't reproduce... When typing node.contains("TBD") into the search form, flask receives

INFO:     127.0.0.1:46104 - "GET /search?q=node.contains%28%22TBD%22%29 HTTP/1.1" 200 OK

This looks OK and works. How did you test it?

Oct 17 '24 19:10 haxtibal

Good to know that the search screen itself is not affected.

My testing was around the links that are on the statistics screen: http://127.0.0.1:5111/project_statistics.html. The legend has escaped/broken hrefs.

Oct 18 '24 15:10 stanislaw

Bildschirmfoto 2024-11-09 um 23 31 46

Noticed another regression just now. I think it is really the last one 😄

Nov 09 '24 22:11 stanislaw

I think, now we can close this one 😄 Thanks @haxtibal for supporting this.

Nov 10 '24 22:11 stanislaw

strictdoc strictdoc copied to clipboard

Epic: Correct escaping in HTML output

strictdoc
strictdoc copied to clipboard