scrapely
scrapely copied to clipboard
safehtml omit some important (all) attributes of tags
Let's consider that someone (like me) want to keep an img
tag so the src
attribute of this tag would be important for him/her. But safehtml()
function omit all the attributes of the relevant tag.
I think it would better to keep attributes of allowed_tags
or add another param named allowed_attributes
to specify which attributes to keep.
Hi @SirbitoX. I was having a discussion about this last week and we were thinking about adding a new less strict version of safe html
. The new type would be somewhere between raw html
and safe html
keeping img
tags and possibly other tags too.
Other than img
tags what other tags do you add? Would you mind explaining your specific use case? Are you extracting articles or products or leads?
Hi @ruairif,
I'm extracting articles and I keep all the images in the description of scraped article so to do this I would need the src
attribute or even height
and width
attributes of the img
tag.
Probably I plan to keep the embed videos in the description, either. But it wouldn't be an issue if we support something like allowed_attributes
.