jsoup
jsoup copied to clipboard
jsoup 1.15.2 appears to insert new spaces
Hi all,
When upgrading from 1.15.1 to 1.15.2 I appear to have encountered unexpected insertion of spaces - is this a bug or desired behaviour?
In this test with a multiline string:
<h1>This is my comment</h1>
<p>Lorem ipsum</p>
<span>Thanks</span>
1.15.1 output of val parsed = Jsoup.parse(textWithHtml):
<h1>This is my comment</h1>
<p>Lorem ipsum</p><span>Thanks</span>
1.15.2 output of val parsed = Jsoup.parse(textWithHtml):
<h1>This is my comment</h1>
<p>Lorem ipsum</p> <span>Thanks</span>
(space inserted between </p> and <span>)
EDITED: To remove references to using .text() on the output of Jsoup.parse
Thanks for sharing the examples. I'm trying to reproduce this using Jsoup 1.15.2 and Java 17.0.4 since I don't have Kotlin installed on my machine and the following code snippet produces a space between ipsum</p> <span> as well when I use the Jsoup.parse() method without the text() method.
Document jsoupHtml() throws IOException {
String multiLineHtml = """
<h1>This is my comment</h1>
<p>Lorem ipsum</p>
<span>Thanks</span> """;
Document resultingHtml = Jsoup.parse(multiLineHtml);
return resultingHtml;
}
//The above code produces this Html for me:
<html>
<head></head>
<body>
<h1>This is my comment</h1>
<p>Lorem ipsum</p> <span>Thanks</span>
</body>
</html>
When I try using the text() method like you do in your example, I don't see an extra space in the final String result.
String jsoupHtml() throws IOException {
String multiLineHtml = """
<h1>This is my comment</h1>
<p>Lorem ipsum</p>
<span>Thanks</span> """;
Document resultingHtml = Jsoup.parse(multiLineHtml);
String textOfHtml = resultingHtml.text();
return textOfHtml;
}
The above code snippet produces the following String for me without extra spaces when using text() and System.out.println() to print the result to a Ubuntu Linux terminal:
This is my comment Lorem ipsum Thanks
Thanks for sharing your example but maybe I'm missing something when trying to reproduce this issue?
Hi, I checked as well, I got the same as @jeffthomasweb. My Java version is: IBM Semeru Runtime Open Edition 17.0.2.0 (build 17.0.2+8)
Thanks so much for your time both, I've edited my post to remove references to using .text() - it didn't line up with the HTML outputs I pasted... I'm not sure what I was smoking there.
I've created a tiny scala repro case located here, also attached as two jars in a zip (each one with a different jsoup version)
https://github.com/henricook/jsoup-1802-repro
jsoup-bug-1802-jsoup-1.15.x.zip
I'm Ubuntu / Java 11
I know it looks like it, but jsoup is not inserting a space here. It is actually collapsing a newline into a single space - and would collapse multiples of those if present.
I have improved the pretty-printer to now also collapse this space, similar to the earlier behavior.
Thanks for the report!
Thanks @jhy !