webwhiz icon indicating copy to clipboard operation
webwhiz copied to clipboard

Content being ignored by Crawlee

Open crptopool opened this issue 1 year ago • 0 comments

While crawling content it is ignoring text between certain tags like for example the content below between <aside></aside> is completely ignored.

<aside class="content tip astro-duqfclob" aria-label="Tip">
	<p class="title astro-duqfclob" aria-hidden="true">
		<span class="icon astro-duqfclob">
			<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 18 18" width="16" height="16" class="astro-duqfclob">
				<path fill-rule="evenodd" d="M14 0a8.8 8.8 0 0 0-6 2.6l-.5.4-.9 1H3.3a1.8 1.8 0 0 0-1.5.8L.1 7.6a.8.8 0 0 0 .4 1.1l3.1 1 .2.1 2.4 2.4.1.2 1 3a.8.8 0 0 0 1 .5l2.9-1.7a1.8 1.8 0 0 0 .8-1.5V9.5l1-1 .4-.4A8.8 8.8 0 0 0 16 2v-.1A1.8 1.8 0 0 0 14.2 0h-.1zm-3.5 10.6-.3.2L8 12.3l.5 1.8 2-1.2a.3.3 0 0 0 .1-.2v-2zM3.7 8.1l1.5-2.3.2-.3h-2a.3.3 0 0 0-.3.1l-1.2 2 1.8.5zm5.2-4.5a7.3 7.3 0 0 1 5.2-2.1h.1a.3.3 0 0 1 .3.3v.1a7.3 7.3 0 0 1-2.1 5.2l-.5.4a15.2 15.2 0 0 1-2.5 2L7.1 11 5 9l1.5-2.3a15.3 15.3 0 0 1 2-2.5l.4-.5zM12 5a1 1 0 1 1-2 0 1 1 0 0 1 2 0zm-8.4 9.6a1.5 1.5 0 1 0-2.2-2.2 7 7 0 0 0-1.1 3 .2.2 0 0 0 .3.3c.6 0 2.2-.4 3-1.1z" class="astro-duqfclob"></path>
			</svg>
		</span>
		Tip
	</p>
	<section class="astro-duqfclob">
		<p>A common pattern in Astro is to import global CSS inside a <a href="/en/core-concepts/layouts/">Layout component</a>. Be sure to import the Layout component before other imports so that it has the lowest precedence.</p>
	</section>
</aside>

The above code produces output as per screenshot below and also can be seen in action on this link :

image

All text inside <aside></aside> is ignored. Please advise.

crptopool avatar Oct 17 '23 08:10 crptopool