private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

Update ingest.py

Open mikeyang01 opened this issue 2 years ago • 2 comments

chunk_size 500 requires too much memory, even 32GB can't fit in it, change to 200, which works fine on 16GB macbook m1

mikeyang01 avatar May 23 '23 06:05 mikeyang01

A better approach would be to set the chunk size via an argument flag or in the .env file, and default to a reasonable number when omitted. If users have the available RAM they should be able to use it and not be limited to a lower number.

xD0135 avatar May 27 '23 19:05 xD0135

chunk_size 500 requires too much memory, even 32GB can't fit in it, change to 200, which works fine on 16GB macbook m1

As you ingest more data you will find that halving the chunk size very nearly doubles the RAM footprint during ingestion and when the db is made persistent, it nearly doubles again the RAM footprint as it saves. Some data points, ingesting 1900 books averaging 1MB in size has a running RAM footprint of about 15MB at 1000/100 chunk size. At 500/50 chunk size, the same upload of books is above 30MB and will choke a 64GB machine when persisted to disk. Decreasing the chunk size, if it works at all, is a bad idea for more than trivial amounts of data.

johnbrisbin avatar May 30 '23 20:05 johnbrisbin