llm-scraper icon indicating copy to clipboard operation
llm-scraper copied to clipboard

Bug: `tool_calls` is sometimes undefined

Open Ademsk1 opened this issue 9 months ago • 0 comments

When attempting to access content that might be blocked, I'd like to safely handle this. When doing so however I come across the following error which crashes my server:

...ab/src/node_modules/llm-scraper/dist/models.js:41
  const c = completion.choices[0].message.tool_calls[0].function.arguments;
                                                    ^
TypeError: Cannot read properties of undefined (reading '0')
    at generateOpenAICompletions

Digging into the response of the completion.choices we see something like:

[
  {"index":0,
  "message":
    {"role":"assistant",
     "content":"The content you provided shows that access to the requested webpage has been blocked due to security measures implemented by Cloudflare, likely triggered by specific actions or commands deemed suspicious. This type of response is commonly served when automated systems (like web scrapers) or aggressive browsing behaviors are detected. There is no job-related content or other typical webpage elements displayed in the provided HTML. Instead, it provides information about why the access was denied, suggesting methods to resolve the issue such as contacting the site owner."
    },
  "logprobs":null,
  "finish_reason":"stop"
  }
]

My schema description contains this at the end:

If the content is inaccessible, e.g. behind a paywall, or has been blocked, the scraper will describe the error in the error field, and the appropriate status code (e.g. 401: Unauthorized, or 403: Forbidden).

Could my schema be affecting the completion content? Here's also the code that I use. Wrapping in try doesn't seem to do much.

try {
    const openai = initialise()
    const browser = await chromium.launch();
    const scraper = new LLMScraper(browser, openai);
    const pages = await scraper.run(url, {
      model: "gpt-4-turbo",
      schema,
      mode: "html",
      closeOnFinish: true,
    })
    const stream = []
    for await (const page of pages) {
      stream.push(page)
    }
    console.log(stream[0].data)
    return stream[0].data

Ademsk1 avatar May 01 '24 14:05 Ademsk1