firecrawl icon indicating copy to clipboard operation
firecrawl copied to clipboard

Scraping job data

Open Thaslim42 opened this issue 1 year ago • 5 comments

while i used firecrawl to scrape data from a job site it only scraped data from the initial page. but the actual data is present inside the job title link i wanted to extract that data too how can i achievev it? ...here is a sample screenshot of the page info

Thaslim42 avatar Sep 02 '24 06:09 Thaslim42

Hey @Thaslim42 could you try running with allowBackwardLinks option? This enables the crawler to navigate from a specific URL to previously linked pages or pages that are not children of the one that you started the crawl.

nickscamara avatar Sep 03 '24 02:09 nickscamara

it still dont worked..any other options?

Thaslim42 avatar Sep 03 '24 11:09 Thaslim42

@Thaslim42

ccing @tomkosm

nickscamara avatar Sep 04 '24 15:09 nickscamara

@Thaslim42 are you using scrape or crawl? You should use crawl for this, please share all of the options you are using, the url and the result you are getting. Also are you running self host or using the api?

tomkosm avatar Sep 06 '24 11:09 tomkosm

i used firecrawl playground to crawl this url and it only scraped links of unwanted contents like newsletter gallary etc..but the content inside the job title link didnt scarped...here is a ss of params i provided Screenshot 2024-09-13 120912

Thaslim42 avatar Sep 13 '24 06:09 Thaslim42

Hey @Thaslim42, It looks like the issue is with the URL you're trying to scrape. The page only exists with the "www" in the address, like this: https://www.infopark.in/companies/jobs/thrissur. The server seems to have a DNS issue where it can't find the page without "www". You should be able to resolve this by adding "www" to the URL and using the allowBackwardLinks option.

Let me know if this helps! I'm closing this issue for now, but feel free to reopen it if needed.

rafaelsideguide avatar Oct 18 '24 17:10 rafaelsideguide