autoscraper icon indicating copy to clipboard operation
autoscraper copied to clipboard

It is unable to scrape <li>

Open amztc34283 opened this issue 1 year ago • 6 comments

Screenshot 2024-10-07 at 8 40 01 PM

wanted_list = ["Design, develop, test, refactor and scale backend implementations of new and existing consumer product features"]

scraper = AutoScraper()
result = scraper.build(url, wanted_list)

I am able to scrape the element in the wanted_list but similar elements are not scraped successfully, any tips and tricks could fix this?

amztc34283 avatar Oct 08 '24 01:10 amztc34283

please provide your full code including the url.

alirezamika avatar Oct 08 '24 07:10 alirezamika

wanted_list = ["Design, develop, test, refactor and scale backend implementations of new and existing consumer product features"]
scraper = AutoScraper()
result = scraper.build(url, wanted_list)
print(result)

Link: https://careers.chime.com/en/jobs/4225356002/backend-engineer/

amztc34283 avatar Oct 08 '24 14:10 amztc34283

What is your expected output?

alirezamika avatar Oct 09 '24 06:10 alirezamika

My expected output is the content of all the <li> under the same <ul> which is: Design, develop ... Work with ... Collaborate with ... Proactively find ...

amztc34283 avatar Oct 11 '24 15:10 amztc34283

you can try the contain_sibling_leaves attribute.

result = scraper.get_result_similar(url, contain_sibling_leaves=True)

alirezamika avatar Oct 12 '24 09:10 alirezamika

I will give it a try, thanks.

In addition, can you point me to the part of the code that decides which elements to scrape based on the wanted_list? Thank you!

amztc34283 avatar Oct 14 '24 14:10 amztc34283

its basically the whole code 😅

alirezamika avatar Nov 02 '24 09:11 alirezamika

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Dec 03 '24 02:12 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Dec 18 '24 02:12 github-actions[bot]