gpt-crawler icon indicating copy to clipboard operation
gpt-crawler copied to clipboard

Json too large for GPT

Open tristan-mcinnis opened this issue 1 year ago • 7 comments

Hi All!

I realize this should largely be about the actual 'crawling' of the sites - but given this was such a breeze with this tool I now find myself with the issue that the text that has been crawled far exceed the limits of what chatgpt can handle.

Does anyone have any recommendation on how to split the json files so as to evenly reach the limits as set by ChatGPT? I've tried both in GPT and in Assistants. In both cases, my json includes too much text

tristan-mcinnis avatar Dec 13 '23 06:12 tristan-mcinnis

They said to use maxFileSize or maxTokens parameter to control the size of the json file. https://github.com/BuilderIO/gpt-crawler?tab=readme-ov-file#create-a-custom-gpt

leicheng42 avatar Dec 18 '23 07:12 leicheng42

Totally missed that part when doing the setup!

"if you get an error about the file being too large, you can try to split it into multiple files and upload them separately using the option maxFileSize in the config.ts file or also use tokenization to reduce the size of the file with the option maxTokens in the config.ts file"

Thanks for mentioning! I'll have another look on that.

tristan-mcinnis avatar Dec 18 '23 07:12 tristan-mcinnis

What is the current limit of ChatGPT (size and number of files).

I need to do this on about 5,000 files of C# code.

ctrlbrk42 avatar Dec 20 '23 22:12 ctrlbrk42

What is the current limit of ChatGPT (size and number of files).

I need to do this on about 5,000 files of C# code.

Doesn't seem feasible!

How many files can I upload at once per GPT?

Up to 20 files per GPT for the lifetime of that GPT. Keep in mind there are file size restrictions and usage caps per user/org.

What are those file upload size restrictions?

  • All files uploaded to a GPT or a ChatGPT conversation have a hard limit of 512MB per file.

  • All text text and document files uploaded to a GPT or to a ChatGPT conversation are capped at 2M tokens per files. This limitation does not apply to spreadsheets.

  • For images, there's a limit of 20MB per image.

  • Additionally, there are usage caps:

    • Each end-user is capped at 10GB.
    • Each organization is capped at 100GB.
    • Note: An error will be displayed if a user/org cap has been hit.

From: https://help.openai.com/en/articles/8555545-file-uploads-faq

leicheng42 avatar Dec 29 '23 07:12 leicheng42

hmm.. since limitation doesn't apply to spreadsheets. Can we just convert json to excel? would it be the same?

antorio avatar Jan 13 '24 00:01 antorio

so what are your thoughts ?

julian-passebecq avatar Feb 08 '24 09:02 julian-passebecq

Use RAG and langchain instead of ChatGPT or assistants.

tristan-mcinnis avatar Feb 10 '24 01:02 tristan-mcinnis