Caleb Peffer issues

Results 45 issues of


                                            Caleb Peffer

[Feat] Run actions like clicking, scrolling or advanced waiting on page before extraction

I've had a few prospects/customers ask me if we could allow them to run Javascript and actions on the page before we extract the data, such as scrolling the page...

Customer Request

[Feat] Improve Handling of Sitemap Structure for URLs with Include Paths

A customer reached out about https://www.clinikally.com/blogs/news. They were trying to crawl it with the parameters.: Include only paths: blogs/news/* The results were inconsistent, sometimes giving 9 links sometimes giving more...

bug

enhancement

[BUG] Certain videos in Iframe's aren't being captured

https://www.liveflow.io/product-guides/how-to-disable-links-to-quickbooks. It's a Loom video; funnily enough, there are other videos in the same format that seem to work on the site. Tried: * adding a timeout of 2000 *...

bug

Customer Request

[Feat] Add automatic retries to failed links on crawl

I've had a few customers who are concerned with the few failures they experience during long crawl jobs. Some have even implemented their own automatic retry functionality. If we handle...

[BUG] https://static01.nyt.com/newsgraphics/documenttools/f6ab5c368725101c/43d7c2a0-full.pdf

PDF parsing is not great on this

bug

Broken Link

[BUG] https://www.mccarthy.com/craft/search?jobviteiframe=job%2FoYD3tfwR has missing content for the page, even with the waitFor parameter set to True

When using the /scrape endpoint on https://www.mccarthy.com/craft/search?jobviteiframe=job%2FoYD3tfwR, the job description information doesn't appear on the page. [] notify will on crisp once completed

bug

high-priority

Broken Link

[BUG] https://refact.ai/ timesout and returns an error on playground not sure why the site seems fairly simple

Hey, https://refact.ai/ on /scrape is timing out, not sure why. Could this have something to do with fire-engine? @tomkosm CCing customers @nyacg @danny-hunt for visibility

bug

high-priority

Customer Request

[BUG] /Scrape on this link is taking 20s. Why?

https://dlp.dubai.gov.ae/en/Pages/OfficialGazette.aspx @mogery

bug

high-priority

v1: Extract LinksOnPage from HTML that has includedTags / excludedTags parameters already applied

A customer who is using the linksOnPage field noticed that it still includes links from headers and footers, even though they have been removed from the content. Move the URL...

[BUG] Crawl on single page works on playground but fails via curl

**Describe the Bug** For some odd reason, the using the crawl endpoint on this link [https://www.tripadvisor.com/Restaurant_Review-g60763-d4418144-Reviews-Reichenbach_Hall-New_York_City_New_York.html ](https://www.tripadvisor.com/Restaurant_Review-g60763-d4418144-Reviews-Reichenbach_Hall-New_York_City_New_York.html) returns a single page on the playground, but returns nothing via curl request...

bug