gpt-crawler
gpt-crawler copied to clipboard
Json too large for GPT
Hi All!
I realize this should largely be about the actual 'crawling' of the sites - but given this was such a breeze with this tool I now find myself with the issue that the text that has been crawled far exceed the limits of what chatgpt can handle.
Does anyone have any recommendation on how to split the json files so as to evenly reach the limits as set by ChatGPT? I've tried both in GPT and in Assistants. In both cases, my json includes too much text
They said to use maxFileSize or maxTokens parameter to control the size of the json file. https://github.com/BuilderIO/gpt-crawler?tab=readme-ov-file#create-a-custom-gpt
Totally missed that part when doing the setup!
"if you get an error about the file being too large, you can try to split it into multiple files and upload them separately using the option maxFileSize in the config.ts file or also use tokenization to reduce the size of the file with the option maxTokens in the config.ts file"
Thanks for mentioning! I'll have another look on that.
What is the current limit of ChatGPT (size and number of files).
I need to do this on about 5,000 files of C# code.
What is the current limit of ChatGPT (size and number of files).
I need to do this on about 5,000 files of C# code.
Doesn't seem feasible!
How many files can I upload at once per GPT?
Up to 20 files per GPT for the lifetime of that GPT. Keep in mind there are file size restrictions and usage caps per user/org.
What are those file upload size restrictions?
-
All files uploaded to a GPT or a ChatGPT conversation have a hard limit of 512MB per file.
-
All text text and document files uploaded to a GPT or to a ChatGPT conversation are capped at 2M tokens per files. This limitation does not apply to spreadsheets.
-
For images, there's a limit of 20MB per image.
-
Additionally, there are usage caps:
- Each end-user is capped at 10GB.
- Each organization is capped at 100GB.
- Note: An error will be displayed if a user/org cap has been hit.
From: https://help.openai.com/en/articles/8555545-file-uploads-faq
hmm.. since limitation doesn't apply to spreadsheets. Can we just convert json to excel? would it be the same?
so what are your thoughts ?
Use RAG and langchain instead of ChatGPT or assistants.