kernel-memory
kernel-memory copied to clipboard
Difficulty with ingesting large files
I'm trying to ingest a 100+GB file of legal data into a Kernel Memory service. The data I would like to access are the "opinions" files from this link (https://com-courtlistener-storage.s3-us-west-2.amazonaws.com/list.html?prefix=bulk-data/). They are zipped .bz2 files.
To ingest the data, I use azcopy to get a file into a container. Then, I have a function which triggers on file ingestion in this container. The function unzips the .bz2 file and sends it to Kernel Memory for ingestion in the form of a stream. The zipped file is about 30GB, when I unzip it the size becomes 100+GB.
This is the error message I get when I try to ingest the files into Kernel Memory:
The repository to repro the issue is here: https://github.com/Gpadh/KMFileIngestion/tree/master
Please let me know if I can provide any more details to help.