rdflib icon indicating copy to clipboard operation
rdflib copied to clipboard

RFE: move away from deprecated `html5lib`

Open kloczek opened this issue 11 months ago • 2 comments

Is your feature request related to a problem? Please describe. It would be nice tu cut tail of some legacy modules decencies. One of those modules is html5lib.

Describe the solution you'd like it wold be good to remove use od=f the html5lib deprecated html5lib like it has been done with pip ~2 years ago. https://github.com/pypa/pip/pull/11259

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context html5lib depends on six which is on list of deprecated modules even longer implanting this RFE would make easier kill two birds using one stone 😋

kloczek avatar Mar 20 '24 16:03 kloczek

I support this idea of moving away from html5lib. @kloczek please do make a PR for this!

nicholascar avatar May 18 '24 22:05 nicholascar

I've looked into this, it looks like html5lib is used by Literals with data-type rdf:HTML. HTML5Lib parses the literal lexical, checks its valid, and normalizes it (not sure what that does). HTML5Lib is also used for serializing rdf:HTML literals back to text when required.

Two candidates for replacing this are BeautifulSoup (I've used this before, but its quite different than html5lib) and the built-in python html.parse() (that is what the pip library used when moving away from html5lib).

ashleysommer avatar Jul 25 '24 01:07 ashleysommer

I hope you guys can solve this! A html5lib issue is still hindering the use of the RDF-based html vocabulary (despite your efforts to get it fixed), together with another RDFLib issue. Would be so great if this could be solved!

floresbakker avatar Sep 23 '24 20:09 floresbakker

@kloczek @floresbakker I have a PR #2911 that will replace html5lib with html5lib-modern, that does not use six.

Note however, this is not the last source of six sub-dependency.

The isodate module used in RDFLib also depends on six, and also hasn't been updated in over 3 years, (though it is not marked as deprecates like html5lib)

The path to fix, upgrade or replace isodate is not so clear.

ashleysommer avatar Sep 25 '24 06:09 ashleysommer

@kloczek @floresbakker I have a PR https://github.com/RDFLib/rdflib/pull/2911 that will replace html5lib with html5lib-modern, that does not use six.

Cannot find html5lib-modern on pypi. Why not move to html5lib? 🤔

kloczek avatar Sep 26 '24 07:09 kloczek

@kloczek It is published on Pypi here: https://pypi.org/project/html5lib-modern/

Why not move to html5lib?

html5lib is the abandoned and deprecated dependency we are moving away from (you are the one who raised the issue).

ashleysommer avatar Sep 26 '24 07:09 ashleysommer

Why not move to html5lib?

html5lib is the abandoned and deprecated dependency we are moving away from (you are the one who raised the issue).

My mistake. Sorry.

kloczek avatar Sep 26 '24 07:09 kloczek