Scrapegraph-ai icon indicating copy to clipboard operation
Scrapegraph-ai copied to clipboard

Local HTML cache

Open clemlesne opened this issue 10 months ago • 3 comments

Is your feature request related to a problem? Please describe.

Implement a local HTML caching. To avoid re-process all scraping each time. It's notably long with recursive depths scrapings.

Describe the solution you'd like

Add a local cache near:

https://github.com/ScrapeGraphAI/Scrapegraph-ai/blob/aa6a76e5bdf63544f62786b0d17effa205aab3d8/scrapegraphai/nodes/fetch_node.py#L245

Describe alternatives you've considered

  • Implement myself, not feasible as this is bundled in many graph implementations

Related

  • #898

clemlesne avatar Jan 17 '25 14:01 clemlesne

Hi, @clemlesne. I'm Dosu, and I'm helping the Scrapegraph-ai team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • Proposal to add local HTML caching for performance optimization during recursive depth scrapings.
  • Suggested implementation location in the code, noting complexity due to integration with various graph implementations.
  • Linked to related issue #898, indicating broader context or dependency.
  • No comments or activity since the issue was created.

Next Steps:

  • Please let me know if this issue is still relevant to the latest version of the Scrapegraph-ai repository by commenting here.
  • If there is no further activity, the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Apr 20 '25 16:04 dosubot[bot]

Still relevant

clemlesne avatar Apr 24 '25 15:04 clemlesne

@PeriniM, the user has indicated that the proposal for adding local HTML caching for performance optimization is still relevant. Could you please assist them with this issue?

dosubot[bot] avatar Apr 24 '25 15:04 dosubot[bot]