strictdoc
strictdoc copied to clipboard
Epic: Correct escaping in HTML output
- [x] https://github.com/strictdoc-project/strictdoc/pull/1929
- [x] https://github.com/strictdoc-project/strictdoc/pull/1921
- [x] https://github.com/strictdoc-project/strictdoc/pull/1936
- [x] HTML escaping: Project tree. https://github.com/strictdoc-project/strictdoc/pull/1943
- [x] HTML escaping: add itest: Traceability Matrix screen. https://github.com/strictdoc-project/strictdoc/pull/1943
- [ ] HTML escaping: add itest: Search screen. https://github.com/strictdoc-project/strictdoc/pull/1954
- [ ] HTML escaping: add itest: Statistics screen. https://github.com/strictdoc-project/strictdoc/pull/1954
- [ ] HTML escaping: add itest: Diff/Changelog screens.
- [ ] HTML escaping: add itest: HTML2PDF screen.
- [ ] render_url() and friends should internally do URL escaping with something like urllib.parse.urlencode, then mark as safe.
To reproduce, generate an html export of the following document
[DOCUMENT]
TITLE: Written <b>bold</b>
[SECTION]
TITLE: Some text is <invisible>
[REQUIREMENT]
UID: REQ-1
TITLE: <script>alert(1)</script>
STATEMENT: >>>
We can also execute scripts.
<<<
[/SECTION]
I assume this is a bug, not a feature...
html.escape is already used in some places. I'll have a look and try to catch more cases.
Thanks for reporting this. It is a lack of feature 😄
The html.escape was implemented for multiline fields but not for single-line. Let's implement it for single-line as well.
The html.escape was implemented for multiline fields
IIUC multiline was implemented for the UI forms in #1018. Multiline in static views (like components/requirement/index.jinja) is escaped too, but for a different reason: We pipe it through docutils, which does escaping under the hood.
I thought about when and where is the best place to do the escaping:
backend.sdoc(SDocNodeField.enumerate_all_fields_escapedexists right now) is central, but IMO it's wrong because it adds HTML specific stuff to a core component- call
html.escapein Jinja2 templates where needed (lots of places) - pipe through |e in Jinja2 templates (same lot of places, but keeps code cleaner)
- use Jinja2 automatic escaping and opt out where it would hurt (e.g., we must not escape pre-rendered rst markup)
- introduce wrapper objects that can be assigned to sdoc_entity, so that getters return escaped values
I'd probably start by trying automatic escaping and see how it works out. Do you have an opinion?
backend.sdoc (SDocNodeField.enumerate_all_fields_escaped exists right now) is central, but IMO it's wrong because it adds HTML specific stuff to a core component
Sorry for the delays with my responses, I am running low on my free time.
I think I would prefer the enumerate_all_fields_escaped option for the time being to have it central like you said. Adding the HTML-specific stuff to a core component should be ok because currently there is no intermediate layer between SDoc classes and Jinja. I started introducing the view object classes to be that kind of interface to Jinja but in this specific example with escaping it will probably not help. We could revisit this later if a stronger need in an intermediate layer would emerge.
If we used enumerate_all_fields_escaped everywhere, could the change be contained in only a small number of places?
backend.sdoc (SDocNodeField.enumerate_all_fields_escaped exists right now) is central, but IMO it's wrong because it adds HTML specific stuff to a core component
Sorry for the delays with my responses, I am running low on my free time.
No hurry. Same here.
I think I would prefer the
enumerate_all_fields_escapedoption
I did some trials with Jinja2 autoescaping already yesterday and wanted to present it first before discarding the idea. Therefore #1921. If you don't agree with the reasoning just close that MR and let's continue discussion here.
@haxtibal
I have added a few TODOs here but let's consider them a very low priority. You/I don't have to rush into completing them all as long as your use case is working.
I would just like to have more test coverage to make sure we didn't miss any screens just in case. StrictDoc has many such low-priority tickets which all would love attention, so let's move according to what stands out most.
It turns out that the search feature has regressed due to the autoescaping: http://127.0.0.1:5111/search?q=node.contains("TBD"). We may be missing an end-to-end test to catch this.
It turns out that the search feature has regressed due to the autoescaping:
http://127.0.0.1:5111/search?q=node.contains("TBD"). We may be missing an end-to-end test to catch this.
Will have a look, hopefully tomorrow.
Hm, can't reproduce... When typing node.contains("TBD") into the search form, flask receives
INFO: 127.0.0.1:46104 - "GET /search?q=node.contains%28%22TBD%22%29 HTTP/1.1" 200 OK
This looks OK and works. How did you test it?
Good to know that the search screen itself is not affected.
My testing was around the links that are on the statistics screen: http://127.0.0.1:5111/project_statistics.html. The legend has escaped/broken hrefs.
Noticed another regression just now. I think it is really the last one 😄
I think, now we can close this one 😄 Thanks @haxtibal for supporting this.