Alroy

Results 7 comments of Alroy

for this i fixed it using web_content = "".join( extract( original_html, include_formatting=True, include_tables=True, include_comments=False, include_links=False, output_format="xml", favor_recall=True, ) ) # type: ignore

Should this fix be added?

https://www.temu.com/privacy-and-cookie-policy.html for this site as well, it doesn't scrape the entire page content

Do you feel this could be related to #318?

See this site uses bullet points in tables a lot. https://www.spotify.com/in-en/legal/privacy-policy/ Just leaving this example, hope is useful

Try it for spotify https://www.spotify.com/in-en/legal/privacy-policy/ The lists in the tables arnt being captured yet @adbar @mikhainin