Example code fails to put a 9GB file
I have a video file that I would like to upload to web3.storage using the web3.storage nodejs library. I am using example code found at https://docs.web3.storage/how-tos/store/#preparing-files-for-upload
I'm running the script on a VPS with 2048 MB of RAM. I am unable to successfully run the upload script because the process is getting killed before it completes.
The only output I see in the console is Killed
My best guess is that the script is consuming too much memory.
Hello @insanity54 Are you using the code example for Node.js https://docs.web3.storage/how-tos/store/#node.js ? It seems you are using the browser way (?), where we need to put everything in memory. The Node.js way will read files with ReadableStream and should not get everything in memory. Can you show me the code you are using?
Thanks @vasco-santos. Here is the code I'm using
/**
* web3-upload.mjs
*
* Example Usage: node --experimental-modules ./web3-upload.mjs ~/Videos/my-large-video.mp4
*/
import dotenv from 'dotenv'
import minimist from 'minimist'
import { Web3Storage, getFilesFromPath } from 'web3.storage'
dotenv.config();
async function getFiles(path) {
const files = await getFilesFromPath(path)
console.log(`read ${files.length} file(s) from ${path}`)
return files
}
async function upload (opts) {
const { token, file } = opts;
if (typeof token === 'undefined') {
throw new Error('A web3.storage token "token" must be passed in options object, but token was undefined.')
}
if (typeof file === 'undefined') {
throw new Error('file was undefined.')
}
const filesObject = await getFiles(file)
console.log(filesObject)
await storeWithProgress(filesObject);
}
function getAccessToken() {
const token = process.env.WEB3_TOKEN;
if (typeof token === 'undefined') {
return console.error('A token is needed. (WEB3_TOKEN in env must be defined). You can create one on https://web3.storage. ')
}
return token
}
function makeStorageClient() {
return new Web3Storage({ token: getAccessToken() })
}
async function storeWithProgress(files) {
console.log(`uploading files ${files}`)
// show the root cid as soon as it's ready
const onRootCidReady = cid => {
console.log('uploading files with cid:', cid)
}
// when each chunk is stored, update the percentage complete and display
const totalSize = files.map(f => f.size).reduce((a, b) => a + b, 0)
let uploaded = 0
const onChunkStored = size => {
uploaded += size
const pct = totalSize / uploaded
console.log(`Uploading... ${pct.toFixed(2)}% complete`)
}
// makeStorageClient returns an authorized Web3.Storage client instance
const client = makeStorageClient()
// client.put will invoke our callbacks during the upload
// and return the root cid when the upload completes
return client.put(files, { onRootCidReady, onChunkStored })
}
function getCliArgs () {
const args = minimist(process.argv.slice(2))
if (args._.length < 1) {
return console.error('Please supply the path to a file or directory')
}
return args._;
}
async function main () {
await upload({
file: getCliArgs(),
token: getAccessToken()
})
}
main()
Thanks for the snippet @insanity54
As I can see, you are using the Node.js util getFilesFromPath, which means the file is packed into a CAR file and chunked with Streamming, which should mean the memory consumption would be fine. We did a change yesterday that can help with this https://github.com/web3-storage/ipfs-car/pull/74 where we added backpressure to guarantee we do not end up with a large memory footprint on slow readers.
Meanwhile, I am going to test today a ~9GB file and observe the memory consumption. I would also suggest you re-try this with the dependencies updated (npm ls ipfs-car should have [email protected]), and if possible get information of the memory consumption in your VPS while running it.
As far as I understand, it does not even call onChunkStored ? which means the problem can be while transforming the video file to a CAR file and chunking it into small files for sending.
Thanks for the update @vasco-santos
I upgraded ipfs-car
└─┬ [email protected]
└── [email protected]
The upload got further along, this time I saw the console.log which shows the cid
I tracked memory usage every 5 seconds using pidstat --human -r -T ALL -p ${pid}. The process did not exit gracefully, it was killed once again, with the last report from pidstat as follows.
03:33:44 PM UID PID minflt/s majflt/s VSZ RSS %MEM Command
03:33:44 PM 0 2792496 1.46 0.16 11.4G 780.4M 39.3% node
All right, so memory wise this looks good. So, the CAR was correctly generated, but then something apparently happens before the first chunks are sent as your onChunkStored is not called.
I wonder if the generated CAR file from your large video results in a specific particularity that somehow has a bug, I could not replicate it with several different large files. It can be an issue with https://github.com/nftstorage/carbites where we chunk the CAR file.
Did you try that file on a local machine with success? Perhaps we can have a stacktrace.
I tried on a local machine with success, although onChunkStored is still not called.
I would be happy to provide a stacktrace, but I'm not sure how to do that for a node program that isn't crashing.
Would a stacktrace using console.trace() be helpful? If so, I'd need to know where in my code snippet that line would be useful.
Or is this something that can be done with node's debug tool (inspect)?
I did get a stacktrace using strace while the script was running. I don't know if that's what you meant or if it's of any use, but I'll attach it.
I tried on a local machine with success, although onChunkStored is still not called.
The entire 9GB file was packed and it was not called onChunkStored locally?
My best guess so far is that either this is the same issue as https://github.com/web3-storage/ipfs-car/issues/69 (which I can replicate and I am trying to fix), or you will get your disk full, is it possible?
In theory, the kernel would only kill a process in exceptional circumstances, like resource starvation, including memory exhaustion. I am thinking we can get more information about why the process is being killed with something as https://askubuntu.com/a/709366
By the way. the parameter that client.put should receive is onStoredChunk and not onChunkStored per https://github.com/web3-storage/web3.storage/blob/main/packages/client/src/lib/interface.ts#L98
@vasco-santos good catch on the onStoredChunk function name! I opened a PR in the docs repo to fix that. https://github.com/web3-storage/docs/pull/159
Yes, I had several failed upload attempts where the disk became full of files in /tmp. I assumed that was normal behavior, perhaps requiring free disk space equal to the size of the file being uploaded, so I made sure to have >9GB of free disk space when I made subsequent upload attempts.
Here's a dmesg log of the VPS when the node process gets Killed dmesg.log
I noticed a lot of noise in that dmesg (from docker, especially), so I spun up a new VPS for just this task. Here's the dmesg of the new vps which shows the node process getting killed
Thanks for the logs @insanity54 🙏🏼
Out of memory: Killed process 10250 (node) total-vm:12982680kB, anon-rss:1818844kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:4888kB oom_score_adj:0
So, this seems related to https://github.com/web3-storage/ipfs-car/issues/69 which I am also debugging and trying to find the root reason. It seems like https://github.com/ipfs/js-ipfs-unixfs/tree/master/packages/ipfs-unixfs-importer is leaking memory. I will keep you posted
surprising you could get to 9GB, my 4GB ram VM on GCP dies at 1GB or sometimes less
@Amanse can you let me know what version of web3.storage client you are using, and more precisely what ipfs-car version it is using. From [email protected] we had a change that can help with the memory consumption
I just updated my node packages and tried again. Still seeing the same Killed behavior wth [email protected] and [email protected]
@insanity54 I just noticed that you are using getFilesFromPath, can you try using https://github.com/web3-storage/files-from-path#usage instead?
It is also exported from web3.storage as import { Web3Storage, filesFromPath } from 'web3.storage'.
Can you also check you are using it-glob at minimum with 0.0.14?
I switched over to filesFromPath
import { Web3Storage, filesFromPath } from 'web3.storage'
// ...
async function getFiles(path) {
let files = [];
for await (const f of filesFromPath(path)) {
console.log(`file:${f.name}`);
files.push(f);
}
return files
}
// ...
const filesObject = await getFiles(file)
I ran yarn yarn-upgrade-all and checked version of it-glob
# yarn list --pattern it-glob
yarn list v1.22.17
├─ [email protected]
│ └─ [email protected]
└─ [email protected]
Done in 1.09s.
I ran my upload script again and it was Killed
@Amanse can you let me know what version of web3.storage client you are using, and more precisely what
ipfs-carversion it is using. From[email protected]we had a change that can help with the memory consumption
sorry for the late reply, i am using web storage-3.3.3 and ipfs-car-0.5.10
i changed my ram on vm from 4gb to 8gb and now it can upload from 1.4gb to 2.4gb before it's killed
I have enough storage on the VM, is there a way i can use that Directly instead of swap?
I have enough storage on the VM, is there a way i can use that Directly instead of swap?
I think you're asking if there's a way to increase memory by using disk so the web3 process isn't killed? I think I tried that at one point, but wasn't able to see any difference. I used the following guide. https://www.digitalocean.com/community/tutorials/how-to-configure-virtual-memory-swap-file-on-a-vps
I have enough storage on the VM, is there a way i can use that Directly instead of swap?
I think you're asking if there's a way to increase memory by using disk so the web3 process isn't killed? I think I tried that at one point, but wasn't able to see any difference. I used the following guide. https://www.digitalocean.com/community/tutorials/how-to-configure-virtual-memory-swap-file-on-a-vps
yeah it didn't have much effect, but that's cause OS doesn't use swap as a priority for RAM, if we increase swapniess the whole OS performance will start lagging, i was asking if we can use storage for web3 without swap ,as in, web3 uses storage from js side instead of pushing everything to ram
i use swap on my personal machine as well, it's not a solution for this problem sadly
@insanity54 Did you find a work-around? I'm facing the same problem on DigitalOcean.
@gregorym My workaround right now is that I spin up a 16GB VPS specifically for web3 upload.
We can't enlarge our host forever to remove this issue, any concrete solutions?
Heads up folks - we're in the middle of revamping our uploads flow in a super exciting, very IPFS-y way that will be way more usable (CAR generation will be streaming-y so memory constraints way less, among many other benefits). Please stay tuned!
Love to know if this one is fixed, or any prospect?
any updates?