Content from tags inside pre is missing whitespace characters
When emitting a parsed document whitespace characters are usually preserved for pre tags. The following test checks this.
https://github.com/fsharp/FSharp.Data/blob/33e6e825bc2978eb9ce6dd880f31d9e60d452699/tests/FSharp.Data.Tests/HtmlParser.fs#L763-L772
However that's not the case when there are child tags within the pre tag. As soon as the parser encounters a different tag, whitespace characters are removed. So e.g. when emitting the parsed document from the following snippet, whitespace characters after the span tag are missing. Whitespace before the span tag is not missing.
<pre>\r\n This <span>code</span> should be indented and\r\n have line feeds in it</pre>
The code that's responsible for "normalizing" whitespace characters is the following:
https://github.com/fsharp/FSharp.Data/blob/33e6e825bc2978eb9ce6dd880f31d9e60d452699/src/Html/HtmlParser.fs#L373-L382
And x.InsertionMode is calculated as follows:
https://github.com/fsharp/FSharp.Data/blob/49a3bfb22a8955463d7536af1d2df86449e335c6/src/Html/HtmlParser.fs#L353-L356
x.IsFormattedTag is only true if the last parsed tag is pre or code. It should check if it's currently inside a formatted tag, shouldn't it?
Report a related behavior with minimal reproduce steps:
[<EntryPoint>]
let main argv =
let n = List.exactlyOne (HtmlNode.Parse("""<pre>
%module graphics
%{
#include <GL/gl.h>
#include <GL/glu.h>
%}
// Put the rest of the declarations here
...
</pre>"""))
printfn "%s" (n.InnerText())
would produce:
%module graphics
%{
#include <GL/gl.h> #include <GL/glu.h> %} // Put the rest of the declarations here ...
instead of (expected):
%module graphics
%{
#include <GL/gl.h>
#include <GL/glu.h>
%}
// Put the rest of the declarations here
...