trafilatura icon indicating copy to clipboard operation
trafilatura copied to clipboard

Add include_video parameter (iframe elements are missing)

Open fraseInc opened this issue 3 years ago • 9 comments

I've noticed Iframe elements are missing, such as those containing youtube videos. I don't see any arguments to control iframes. Is this by design?

fraseInc avatar Feb 18 '22 22:02 fraseInc

Hi @fraseInc, I tend indeed to discard iframes by design as embedded content is usually not as relevant text-wise. Do you have examples of elements which should be included?

adbar avatar Feb 21 '22 12:02 adbar

I think things like Youtube, and videos in general are important. At least as an optional argument, similar to how you handle tables or images. we could add an include_videos filter?

The typical Youtube embed looks something like this:

<iframe src="//www.youtube.com/embed/IkgLUo82eWg?rel=0" allowfullscreen="" loading="lazy" decoding="async"></iframe>

fraseInc avatar Feb 21 '22 15:02 fraseInc

An additional argument is tricky for maintenance reasons but thanks, I'll think about it.

adbar avatar Feb 21 '22 17:02 adbar