gpt-crawler icon indicating copy to clipboard operation
gpt-crawler copied to clipboard

Help, why can I only climb to the first page of gitbook

Open wt195799611 opened this issue 1 year ago • 5 comments

I tried to crawl this page and could only crawl one page

wt195799611 avatar Nov 29 '23 04:11 wt195799611

https://layerzero.gitbook.io/docs/

wt195799611 avatar Nov 29 '23 04:11 wt195799611

The ** pattern covers all subfolders and files from the specified point. config should be like this:

export const defaultConfig: Config = {
  url: "https://layerzero.gitbook.io/docs",
  match: "https://layerzero.gitbook.io/docs/**",
  maxPagesToCrawl: 10,
  outputFileName: "output.json",
};

isarikaya avatar Dec 07 '23 11:12 isarikaya

The ** pattern covers all subfolders and files from the specified point. config should be like this:

export const defaultConfig: Config = {
  url: "https://layerzero.gitbook.io/docs",
  match: "https://layerzero.gitbook.io/docs/**",
  maxPagesToCrawl: 10,
  outputFileName: "output.json",
};

So I tried this also:

export const defaultConfig: Config = {
  url: "https://overkillgaming.com",
  match: "https://overkillgaming.com/**",
  maxPagesToCrawl: 10,
  outputFileName: "output.json",
};

Problem is that it crawls the first page and stops. (Wordpress site)

Any resolution for this?

BTNGaming avatar Dec 18 '23 20:12 BTNGaming

The ** pattern covers all subfolders and files from the specified point. config should be like this:

export const defaultConfig: Config = {
  url: "https://layerzero.gitbook.io/docs",
  match: "https://layerzero.gitbook.io/docs/**",
  maxPagesToCrawl: 10,
  outputFileName: "output.json",
};

So I tried this also:

export const defaultConfig: Config = {
  url: "https://overkillgaming.com",
  match: "https://overkillgaming.com/**",
  maxPagesToCrawl: 10,
  outputFileName: "output.json",
};

Problem is that it crawls the first page and stops. (Wordpress site)

Any resolution for this?

I ran it with your config and got the following result. Are you sure you followed all the steps correctly? output-1.json

isarikaya avatar Dec 18 '23 20:12 isarikaya

The ** pattern covers all subfolders and files from the specified point. config should be like this:

export const defaultConfig: Config = {
  url: "https://layerzero.gitbook.io/docs",
  match: "https://layerzero.gitbook.io/docs/**",
  maxPagesToCrawl: 10,
  outputFileName: "output.json",
};

So I tried this also:

export const defaultConfig: Config = {
  url: "https://overkillgaming.com",
  match: "https://overkillgaming.com/**",
  maxPagesToCrawl: 10,
  outputFileName: "output.json",
};

Problem is that it crawls the first page and stops. (Wordpress site) Any resolution for this?

I ran it with your config and got the following result. Are you sure you followed all the steps correctly? output-1.json

100%, too bad it's not at least 100kb in size though haha. Too small for uploading to chat gpt/open ai

BTNGaming avatar Dec 19 '23 17:12 BTNGaming