Bump org.jsoup:jsoup from 1.18.1 to 1.21.2
Bumps org.jsoup:jsoup from 1.18.1 to 1.21.2.
Release notes
Sourced from org.jsoup:jsoup's releases.
jsoup 1.21.2
jsoup 1.21.2 is out now, adding support for custom
SSLContextin HTTP/2 connections, and improving consistency in how user data is handled in attributes. It also brings performance gains in DOM manipulation and fragment parsing, and fixes several edge cases in stream parsing, traversal, cloning, and concurrent reads.jsoup is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Changes
- Deprecated internal (yet visible) methods
Normalizer#normalize(String, bool)andAttribute#shouldCollapseAttribute(Document.OutputSettings). These will be removed in a future version.- Deprecated
Connection#sslSocketFactory(SSLSocketFactory)in favor of the newConnection#sslContext(SSLContext). UsingsslSocketFactorywill force the use of the legacyHttpUrlConnectionimplementation, which does not support HTTP/2. #2370Improvements
- When pretty-printing, if there are consecutive text nodes (via DOM manipulation), the non-significant whitespace between them will be collapsed. #2349.
- Updated
Connection.Response#statusMessage()to return a simple loggable string message (e.g. "OK") when using theHttpClientimplementation, which doesn't otherwise return any server-set status message. #2356Attributes#size()andAttributes#isEmpty()now exclude any internal attributes (such as user data) from their count. This aligns with the attributes' serialized output and iterator. #2369- Added
Connection#sslContext(SSLContext)to provide a custom SSL (TLS) context to requests, supporting both theHttpClientand the legacyHttUrlConnectionimplementations. #2370- Performance optimizations for DOM manipulation methods including when repeatedly removing an element's first child (
element.child(0).remove(), and when usingParser#parseBodyFragement()to parse a large number of direct children. #2373.Bug Fixes
- When parsing from an InputStream and a multibyte character happened to straddle a buffer boundary, the stream would not be completely read. #2353.
- In
NodeTraversor, if a last child element was removed during thehead()call, the parent would be visited twice. #2355.- Cloning an Element that has an Attributes object would add an empty internal user-data attribute to that clone, which would cause unexpected results for
Attributes#size()andAttributes#isEmpty(). #2356- In a multithreaded application where multiple threads are calling
Element#children()on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. #2366- Malformed HTML could throw an IndexOutOfBoundsException during the adoption agency. #2377.
jsoup 1.21.1
jsoup 1.21.1 is out now, featuring powerful new node selection capabilities that let you target specific DOM nodes like comments and text nodes using CSS selectors, dynamic tag customization through the new TagSet callback system, and improved defense against mutation XSS attacks with simplified attribute escaping. This release also brings HTTP/2 support by default, numerous API improvements for better developer experience, and fixes for several edge-case parsing issues.
jsoup is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Changes
- Removed previously deprecated methods. #2317
- Deprecated the
:matchTextpseduo-selector due to its side effects on the DOM; use the new::textnodeselector and theElement#selectNodes(String css, Class<T> type)method instead. #2343- Deprecated
Connection.Response#bufferUp()in lieu ofConnection.Response#readFully()which can throw a checked IOException.- Deprecated internal methods
Validate#ensureNotNull(Object)(replaced by typedValidate#expectNotNull(T)); protected HTML appenders from Attribute and Node.- If you happen to be using any of the deprecated methods, please take the opportunity now to migrate away from them, as they will be removed in a future release.
Improvements
- Enhanced the
Selectorto support direct matching against nodes such as comments and text nodes. For example, you can now find an element that follows a specific comment:::comment:contains(prices) + pwill selectpelements immediately after a<!-- prices: -->comment. Supported types include::node,::leafnode,::comment,::text,::data, and::cdata. Node contextual selectors like::node:contains(text),:matches(regex), and:blankare also supported. IntroducedElement#selectNodes(String css)andElement#selectNodes(String css, Class<T> nodeType)for direct node selection. #2324- Added
TagSet#onNewTag(Consumer<Tag> customizer): register a callback that’s invoked for each new or cloned Tag when it’s inserted into the set. Enables dynamic tweaks of tag options (for example, marking all custom tags as self-closing, or everything in a given namespace as preserving whitespace). #2330- Made
TokenQueueandCharacterReaderautocloseable, to ensure that they will release their buffers back to the buffer pool, for later reuse.- Added
Selector#evaluatorOf(String css), as a clearer way to obtain an Evaluator from a CSS query. An alias ofQueryParser.parse(String css).- Custom tags (defined via the
TagSet) in a foreign namespace (e.g. SVG) can be configured to parse as data tags.- Added
NodeVisitor#traverse(Node)to simplify node traversal calls (vs. importingNodeTraversor).- Updated the default user-agent string to improve compatibility. #2341
- The HTML parser now allows the specific text-data type (Data, RcData) to be customized for known tags. (Previously, that was only supported on custom tags.) #2326
- Added
Connection.Response#readFully()as a replacement forConnection.Response#bufferUp()with an explicit IOException. Similarly, addedConnection.Response#readBody()overConnection.Response#body(). DeprecatedConnection.Response#bufferUp(). #2327- When serializing HTML, the
<and>characters are now escaped in attributes. This helps prevent a class of mutation XSS attacks. #2337- Changed
Connectionto prefer using the JDK's HttpClient over HttpUrlConnection, if available, to enable HTTP/2 support by default. Users can disable via-Djsoup.useHttpClient=false. #2340Bug Fixes
- The contents of a
scriptin asvgforeign context should be parsed as script data, not text. #2320
... (truncated)
Changelog
Sourced from org.jsoup:jsoup's changelog.
1.21.2 (2025-Aug-25)
Changes
- Deprecated internal (yet visible) methods
Normalizer#normalize(String, bool)andAttribute#shouldCollapseAttribute(Document.OutputSettings). These will be removed in a future version.- Deprecated
Connection#sslSocketFactory(SSLSocketFactory)in favor of the newConnection#sslContext(SSLContext). UsingsslSocketFactorywill force the use of the legacyHttpUrlConnectionimplementation, which does not support HTTP/2. #2370Improvements
- When pretty-printing, if there are consecutive text nodes (via DOM manipulation), the non-significant whitespace between them will be collapsed. #2349.
- Updated
Connection.Response#statusMessage()to return a simple loggable string message (e.g. "OK") when using theHttpClientimplementation, which doesn't otherwise return any server-set status message. #2356Attributes#size()andAttributes#isEmpty()now exclude any internal attributes (such as user data) from their count. This aligns with the attributes' serialized output and iterator. #2369- Added
Connection#sslContext(SSLContext)to provide a custom SSL (TLS) context to requests, supporting both theHttpClientand the legacyHttUrlConnectionimplementations. #2370- Performance optimizations for DOM manipulation methods including when repeatedly removing an element's first child (
element.child(0).remove(), and when usingParser#parseBodyFragement()to parse a large number of direct children. #2373.Bug Fixes
- When parsing from an InputStream and a multibyte character happened to straddle a buffer boundary, the stream would not be completely read. #2353.
- In
NodeTraversor, if a last child element was removed during thehead()call, the parent would be visited twice. #2355.- Cloning an Element that has an Attributes object would add an empty internal user-data attribute to that clone, which would cause unexpected results for
Attributes#size()andAttributes#isEmpty(). #2356- In a multithreaded application where multiple threads are calling
Element#children()on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. #2366- Malformed HTML could throw an IndexOutOfBoundsException during the adoption agency. #2377.
1.21.1 (2025-Jun-23)
Changes
- Removed previously deprecated methods. #2317
- Deprecated the
:matchTextpseduo-selector due to its side effects on the DOM; use the new::textnodeselector and theElement#selectNodes(String css, Class type)method instead. #2343- Deprecated
Connection.Response#bufferUp()in lieu ofConnection.Response#readFully()which can throw a checked IOException.- Deprecated internal methods
Validate#ensureNotNull(replaced by typedValidate#expectNotNull); protected HTML appenders from Attribute and Node.- If you happen to be using any of the deprecated methods, please take the opportunity now to migrate away from them, as they will be removed in a future release.
Improvements
- Enhanced the
Selectorto support direct matching against nodes such as comments and text nodes. For example, you can now find an element that follows a specific comment:::comment:contains(prices) + pwill selectpelements immediately after a<!-- prices: -->comment. Supported types include::node,::leafnode,::comment,::text,::data, and::cdata. Node contextual selectors like::node:contains(text),:matches(regex), and:blankare also supported. IntroducedElement#selectNodes(String css)andElement#selectNodes(String css, Class nodeType)for direct node selection. #2324- Added
TagSet#onNewTag(Consumer<Tag> customizer): register a callback that’s invoked for each new or cloned Tag when it’s inserted into the set. Enables dynamic tweaks of tag options (for example, marking all custom tags as self-closing, or everything in a given namespace as preserving whitespace).- Made
TokenQueueandCharacterReaderautocloseable, to ensure that they will release their buffers back to the buffer pool, for later reuse.- Added
Selector#evaluatorOf(String css), as a clearer way to obtain an Evaluator from a CSS query. An alias ofQueryParser.parse(String css).- Custom tags (defined via the
TagSet) in a foreign namespace (e.g. SVG) can be configured to parse as data tags.- Added
NodeVisitor#traverse(Node)to simplify node traversal calls (vs. importingNodeTraversor).- Updated the default user-agent string to improve compatibility. #2341
- The HTML parser now allows the specific text-data type (Data, RcData) to be customized for known tags. (Previously, that was only supported on custom tags.) #2326.
- Added
Connection#readFully()as a replacement forConnection#bufferUp()with an explicit IOException. Similarly, addedConnection#readBody()overConnection#body(). DeprecatedConnection#bufferUp(). #2327- When serializing HTML, the
<and>characters are now escaped in attributes. This helps prevent a class of mutation XSS attacks. #2337- Changed
Connectionto prefer using the JDK's HttpClient over HttpUrlConnection, if available, to enable HTTP/2 support by default. Users can disable via-Djsoup.useHttpClient=false. #2340Bug Fixes
- The contents of a
scriptin asvgforeign context should be parsed as script data, not text. #2320Tag#isFormSubmittable()was updating the Tag's options. #2323- The HTML pretty-printer would incorrectly trim whitespace when text followed an inline element in a block element. #2325
- Custom tags with hyphens or other non-letter characters in their names now work correctly as Data or RcData tags. Their closing tags are now tokenized properly. #2332
- When cloning an Element, the clone would retain the source's cached child Element list (if any), which could lead to incorrect results when modifying the clone's child elements. #2334
- When parsing HTML with svg:script elements in SVG elements, don't enter the Text insertion mode, but continue to parse as foreign content. Otherwise, misnested HTML could then cause an IndexOutOfBoundsException. #2374
... (truncated)
Commits
b02837b[maven-release-plugin] prepare release jsoup-1.21.21f0c207v1.21.2 release dateb093463Use central-publishing-maven-plugin615b959Updating sonatype deploy URLs6961720Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.11.2 to 3.11.3 (#2386)82864b2Bump jetty.version from 9.4.57.v20241219 to 9.4.58.v20250814 (#2385)71f963eFix for HTML that breaks the select scope6b20f6eRemoved effective recursion closing\</select>eb2957aBump actions/checkout from 4 to 5 (#2382)3a9a6c7Fix ProxyTest in CI- Additional commits viewable in compare view
You can trigger a rebase of this PR by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Note Automatic rebases have been disabled on this pull request as it has been open for over 30 days.
@dependabot rebase
@dependabot rebase