gpt-crawler icon indicating copy to clipboard operation
gpt-crawler copied to clipboard

Trying to Crawl site nothing working

Open upup666 opened this issue 5 months ago • 1 comments

Hello there Trying to crawl this site https://help.puzzlebot.top

Here is my config file

import { Config } from "./src/config";

export const defaultConfig: Config = {
  url: "https://help.puzzlebot.top",
  match: "https://help.puzzlebot.top/article**",
  maxPagesToCrawl: 300,
  outputFileName: "output.json",
  maxTokens: 2000000,
};

but its crawl online its name what to do?

Thank you

upup666 avatar Jan 17 '24 11:01 upup666

This is because playwriter by default looks for anchor tags to identify other links to go to. But the website you have mentioned does not use tags to link to other pages, but uses event handler to go to other pages.

In short it is a shortcoming of the crawler and not this gpt-crawler.

ashkkr avatar Feb 07 '24 07:02 ashkkr