Don't load CAR file into memory during upload
Command: w3 up --car my_file.car.
When running this command with a large file, for example a CAR file created from a 4 GiB file of random data, I noticed my memory usage would go up by ~4 GiB, indicating w3 is loading the entire file into memory during the upload process.
I'm not sure exactly where in the code this is coming from, but it's not ideal and could cause OOM crashes/issues in some scenarios. Please let me know if a fix for this is possible.
How are you measuring this please? I believe the Node.js default is to allow up to ~4GB of memory to be allocated even if the currently in use memory is less.
I am measuring this with htop and free. Here is a screenshot as an example:
This shows multiple threads, but if we just look at the top entry, it can be seen from the RES column that w3 is taking up over 5 GiB of memory. I could also observe this by just looking at how the overall memory usage (in htop or free) changed from before I started my test, to once w3 is running. It went up by about 4-5 GiB.
I don't think this is related to Node.js memory allocation settings. w3 doesn't do this with smaller files. To me this seems indicative of some code somewhere in the stack that is loading the entire file into memory to do the upload, instead of loading 32 KiB (for example) chunks into memory and uploading those. An uploader CLI should never need to use gigabytes of memory.
Please let me know if you are not able to reproduce this, as it seems like stable behaviour on my machine.
Edit: To be clear, the file uploads fine. It works. But there is a bug here in the form of using large amounts of memory.
@alanshaw any update on this?
@makew0rld have you had any luck uploading even larger files? My colleague was running into time-out issues trying to upload a 32GB CAR file yesterday. After the 8th timeout he tried another CAR file that was 2.9GB, and that was successful on the first try.
After some additional research, I did find this note that gives some details on how a CAR file larger than 4GB is split up into smaller chunks.
I'm noticing in the command you run that you're not passing in the --shard-size option. @alanshaw does the upload command default to a shard size (this for reference) if the file is greater than 4GB?
I haven't tested with larger files than 4 GB.