Scrapegraph-ai issues

improve tokenization function

look at these sources [link](https://colab.research.google.com/github/mistralai/mistral-common/blob/main/examples/tokenizer.ipynb ) blog post [link](https://docs.mistral.ai/guides/tokenization/) For hugging face models from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B") text = "Write your text here" tokens = tokenizer.tokenize(text) num_tokens...

VinciGit00

'SmartScraperGraph' object has no attribute 'model_token'

12

**Describe the bug** Hi, I am trying to scrape webpage using SmartScraperGraph, but am constantly getting the following error- 'SmartScraperGraph' object has no attribute 'model_token' **To Reproduce** This is the...

Naman-Bhrgv

bug

Scrapegraph returns relative path URLs instead of absolute path Possible Bug?

8

**Describe the bug** When using gpt4o as the llm and scraping a webpage to return a list of links, sometimes the paths returned are : - relative paths (OR) -...

sandeepchittilla

OMP: Error #15: Initializing libiomp5md.dll, but found libomp140.x86_64.dll already initialized

7

I meet the problem when I run my pycharm to train some modle , but I don't know how to solve it.I use windows11 ,and it seems that libomp140.x86_64.dll is...

ahworld22

`embedder_model` AttributeError in `/examples/openai/deep_scraper_openai.py`

1

**Describe the bug** When running the OpenAI Deep Scraper example located at `examples/openai/deep_scraper_openai.py`, I get the error: ``` Traceback (most recent call last): File "/Users/ajt/Projects/scrapegraph_playground/openai/deep_scraper_openai.py", line 37, in deep_scraper_graph =...

ajt

Context length exceeded

**Describe the bug** When doing some crawls, I get the following error: `Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted...

Cdingram

Ability to add headers to source ( better integration with Jina AI)

2

**Is your feature request related to a problem? Please describe.** We can assign a url to the `source`. It would be nice if we could also pass in an headers...

angelotc

Support for firecrawl

8

Would be interesting if support was added for firecrawl.ai. They also allow to [self host](https://github.com/mendableai/firecrawl/blob/main/SELF_HOST.md) their service. Firecrawl allows for cleaner crawling, they handle pdf's as well as dynamic websites.

AmosDinh

Intermittent Headless Timeout Error on Non-Local Environments

**Describe the bug** I have this error: "No HTML body content found, please try setting the 'headless' flag to False in the graph configuration. HTML content: Error: Page.goto: Timeout 30000ms...

mamuchastegui

Facing the issue too many requests with GoogleSearch.

5

With concurrent request to googlesearch, receiving the following: ``` 642 def http_error_default(self, req, fp, code, msg, hdrs): --> 643 raise HTTPError(req.full_url, code, msg, hdrs, fp) HTTPError: HTTP Error 429: Too...

aziz-ullah-khan

Scrapegraph-ai
Scrapegraph-ai copied to clipboard

Metadata

improve tokenization function

'SmartScraperGraph' object has no attribute 'model_token'

Scrapegraph returns relative path URLs instead of absolute path Possible Bug?

OMP: Error #15: Initializing libiomp5md.dll, but found libomp140.x86_64.dll already initialized

`embedder_model` AttributeError in `/examples/openai/deep_scraper_openai.py`

Context length exceeded

Ability to add headers to source ( better integration with Jina AI)

Support for firecrawl

Intermittent Headless Timeout Error on Non-Local Environments

Facing the issue too many requests with GoogleSearch.

← Metadata

Owner

Metadata

Scrapegraph-ai Scrapegraph-ai copied to clipboard

Metadata

← Metadata

Owner

Metadata

Scrapegraph-ai
Scrapegraph-ai copied to clipboard