web3.storage icon indicating copy to clipboard operation
web3.storage copied to clipboard

Example code fails to put a 9GB file

Open insanity54 opened this issue 4 years ago • 27 comments

I have a video file that I would like to upload to web3.storage using the web3.storage nodejs library. I am using example code found at https://docs.web3.storage/how-tos/store/#preparing-files-for-upload

I'm running the script on a VPS with 2048 MB of RAM. I am unable to successfully run the upload script because the process is getting killed before it completes.

The only output I see in the console is Killed

My best guess is that the script is consuming too much memory.

insanity54 avatar Aug 06 '21 21:08 insanity54

Hello @insanity54 Are you using the code example for Node.js https://docs.web3.storage/how-tos/store/#node.js ? It seems you are using the browser way (?), where we need to put everything in memory. The Node.js way will read files with ReadableStream and should not get everything in memory. Can you show me the code you are using?

vasco-santos avatar Aug 09 '21 11:08 vasco-santos

Thanks @vasco-santos. Here is the code I'm using

/**
 * web3-upload.mjs
 *
 * Example Usage: node --experimental-modules ./web3-upload.mjs ~/Videos/my-large-video.mp4
 */
import dotenv from 'dotenv'
import minimist from 'minimist'
import { Web3Storage, getFilesFromPath } from 'web3.storage'


dotenv.config();


async function getFiles(path) {
  const files = await getFilesFromPath(path)
  console.log(`read ${files.length} file(s) from ${path}`)
  return files
}

async function upload (opts) {
  const { token, file } = opts;
  if (typeof token === 'undefined') {
    throw new Error('A web3.storage token "token" must be passed in options object, but token was undefined.')
  }
  if (typeof file === 'undefined') {
    throw new Error('file was undefined.')
  }


  const filesObject = await getFiles(file)
  console.log(filesObject)

  await storeWithProgress(filesObject);
}

function getAccessToken() {
  const token = process.env.WEB3_TOKEN;
  if (typeof token === 'undefined') {
    return console.error('A token is needed. (WEB3_TOKEN in env must be defined). You can create one on https://web3.storage. ')
  }
  return token
}


function makeStorageClient() {
  return new Web3Storage({ token: getAccessToken() })
}

async function storeWithProgress(files) {  

  console.log(`uploading files ${files}`)

  // show the root cid as soon as it's ready
  const onRootCidReady = cid => {
    console.log('uploading files with cid:', cid)
  }

  // when each chunk is stored, update the percentage complete and display
  const totalSize = files.map(f => f.size).reduce((a, b) => a + b, 0)
  let uploaded = 0

  const onChunkStored = size => {
    uploaded += size
    const pct = totalSize / uploaded
    console.log(`Uploading... ${pct.toFixed(2)}% complete`)
  }

  // makeStorageClient returns an authorized Web3.Storage client instance
  const client = makeStorageClient()

  // client.put will invoke our callbacks during the upload
  // and return the root cid when the upload completes
  return client.put(files, { onRootCidReady, onChunkStored })
}

 
function getCliArgs () {
  const args = minimist(process.argv.slice(2))

  if (args._.length < 1) {
    return console.error('Please supply the path to a file or directory')
  }
  return args._;
}


async function main () {

  await upload({
    file: getCliArgs(),
    token: getAccessToken()
  })
}

main()

insanity54 avatar Aug 09 '21 17:08 insanity54

Thanks for the snippet @insanity54

As I can see, you are using the Node.js util getFilesFromPath, which means the file is packed into a CAR file and chunked with Streamming, which should mean the memory consumption would be fine. We did a change yesterday that can help with this https://github.com/web3-storage/ipfs-car/pull/74 where we added backpressure to guarantee we do not end up with a large memory footprint on slow readers.

Meanwhile, I am going to test today a ~9GB file and observe the memory consumption. I would also suggest you re-try this with the dependencies updated (npm ls ipfs-car should have [email protected]), and if possible get information of the memory consumption in your VPS while running it.

As far as I understand, it does not even call onChunkStored ? which means the problem can be while transforming the video file to a CAR file and chunking it into small files for sending.

vasco-santos avatar Aug 10 '21 09:08 vasco-santos

Thanks for the update @vasco-santos

I upgraded ipfs-car

└─┬ [email protected]
       └── [email protected]

The upload got further along, this time I saw the console.log which shows the cid

I tracked memory usage every 5 seconds using pidstat --human -r -T ALL -p ${pid}. The process did not exit gracefully, it was killed once again, with the last report from pidstat as follows.

03:33:44 PM   UID       PID  minflt/s  majflt/s     VSZ     RSS   %MEM  Command
03:33:44 PM     0   2792496      1.46      0.16   11.4G  780.4M  39.3%  node

insanity54 avatar Aug 10 '21 16:08 insanity54

All right, so memory wise this looks good. So, the CAR was correctly generated, but then something apparently happens before the first chunks are sent as your onChunkStored is not called.

I wonder if the generated CAR file from your large video results in a specific particularity that somehow has a bug, I could not replicate it with several different large files. It can be an issue with https://github.com/nftstorage/carbites where we chunk the CAR file.

Did you try that file on a local machine with success? Perhaps we can have a stacktrace.

vasco-santos avatar Aug 11 '21 08:08 vasco-santos

I tried on a local machine with success, although onChunkStored is still not called.

I would be happy to provide a stacktrace, but I'm not sure how to do that for a node program that isn't crashing.

Would a stacktrace using console.trace() be helpful? If so, I'd need to know where in my code snippet that line would be useful.

Or is this something that can be done with node's debug tool (inspect)?

I did get a stacktrace using strace while the script was running. I don't know if that's what you meant or if it's of any use, but I'll attach it.

strace-log-2020-08-11.txt

insanity54 avatar Aug 13 '21 01:08 insanity54

I tried on a local machine with success, although onChunkStored is still not called.

The entire 9GB file was packed and it was not called onChunkStored locally?

My best guess so far is that either this is the same issue as https://github.com/web3-storage/ipfs-car/issues/69 (which I can replicate and I am trying to fix), or you will get your disk full, is it possible?

In theory, the kernel would only kill a process in exceptional circumstances, like resource starvation, including memory exhaustion. I am thinking we can get more information about why the process is being killed with something as https://askubuntu.com/a/709366

vasco-santos avatar Aug 13 '21 13:08 vasco-santos

By the way. the parameter that client.put should receive is onStoredChunk and not onChunkStored per https://github.com/web3-storage/web3.storage/blob/main/packages/client/src/lib/interface.ts#L98

vasco-santos avatar Aug 13 '21 13:08 vasco-santos

@vasco-santos good catch on the onStoredChunk function name! I opened a PR in the docs repo to fix that. https://github.com/web3-storage/docs/pull/159

Yes, I had several failed upload attempts where the disk became full of files in /tmp. I assumed that was normal behavior, perhaps requiring free disk space equal to the size of the file being uploaded, so I made sure to have >9GB of free disk space when I made subsequent upload attempts.

Here's a dmesg log of the VPS when the node process gets Killed dmesg.log

insanity54 avatar Aug 14 '21 00:08 insanity54

I noticed a lot of noise in that dmesg (from docker, especially), so I spun up a new VPS for just this task. Here's the dmesg of the new vps which shows the node process getting killed

dmesg-2.log

insanity54 avatar Aug 14 '21 01:08 insanity54

Thanks for the logs @insanity54 🙏🏼

 Out of memory: Killed process 10250 (node) total-vm:12982680kB, anon-rss:1818844kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:4888kB oom_score_adj:0

So, this seems related to https://github.com/web3-storage/ipfs-car/issues/69 which I am also debugging and trying to find the root reason. It seems like https://github.com/ipfs/js-ipfs-unixfs/tree/master/packages/ipfs-unixfs-importer is leaking memory. I will keep you posted

vasco-santos avatar Aug 17 '21 09:08 vasco-santos

surprising you could get to 9GB, my 4GB ram VM on GCP dies at 1GB or sometimes less

Amanse avatar Dec 04 '21 18:12 Amanse

@Amanse can you let me know what version of web3.storage client you are using, and more precisely what ipfs-car version it is using. From [email protected] we had a change that can help with the memory consumption

vasco-santos avatar Dec 06 '21 14:12 vasco-santos

I just updated my node packages and tried again. Still seeing the same Killed behavior wth [email protected] and [email protected]

insanity54 avatar Dec 06 '21 19:12 insanity54

@insanity54 I just noticed that you are using getFilesFromPath, can you try using https://github.com/web3-storage/files-from-path#usage instead?

It is also exported from web3.storage as import { Web3Storage, filesFromPath } from 'web3.storage'.

Can you also check you are using it-glob at minimum with 0.0.14?

vasco-santos avatar Dec 06 '21 20:12 vasco-santos

I switched over to filesFromPath

import { Web3Storage, filesFromPath } from 'web3.storage'

// ...

async function getFiles(path) {
  let files = [];
  for await (const f of filesFromPath(path)) {
    console.log(`file:${f.name}`);
    files.push(f);
  }
  return files
}

// ...

const filesObject = await getFiles(file)

I ran yarn yarn-upgrade-all and checked version of it-glob

# yarn list --pattern it-glob
yarn list v1.22.17
├─ [email protected]
│  └─ [email protected]
└─ [email protected]
Done in 1.09s.

I ran my upload script again and it was Killed

insanity54 avatar Dec 07 '21 07:12 insanity54

@Amanse can you let me know what version of web3.storage client you are using, and more precisely what ipfs-car version it is using. From [email protected] we had a change that can help with the memory consumption

sorry for the late reply, i am using web storage-3.3.3 and ipfs-car-0.5.10

Amanse avatar Dec 07 '21 07:12 Amanse

i changed my ram on vm from 4gb to 8gb and now it can upload from 1.4gb to 2.4gb before it's killed

Amanse avatar Dec 07 '21 07:12 Amanse

I have enough storage on the VM, is there a way i can use that Directly instead of swap?

Amanse avatar Dec 10 '21 06:12 Amanse

I have enough storage on the VM, is there a way i can use that Directly instead of swap?

I think you're asking if there's a way to increase memory by using disk so the web3 process isn't killed? I think I tried that at one point, but wasn't able to see any difference. I used the following guide. https://www.digitalocean.com/community/tutorials/how-to-configure-virtual-memory-swap-file-on-a-vps

insanity54 avatar Dec 10 '21 11:12 insanity54

I have enough storage on the VM, is there a way i can use that Directly instead of swap?

I think you're asking if there's a way to increase memory by using disk so the web3 process isn't killed? I think I tried that at one point, but wasn't able to see any difference. I used the following guide. https://www.digitalocean.com/community/tutorials/how-to-configure-virtual-memory-swap-file-on-a-vps

yeah it didn't have much effect, but that's cause OS doesn't use swap as a priority for RAM, if we increase swapniess the whole OS performance will start lagging, i was asking if we can use storage for web3 without swap ,as in, web3 uses storage from js side instead of pushing everything to ram

i use swap on my personal machine as well, it's not a solution for this problem sadly

Amanse avatar Dec 12 '21 06:12 Amanse

@insanity54 Did you find a work-around? I'm facing the same problem on DigitalOcean.

gregorym avatar Dec 14 '21 18:12 gregorym

@gregorym My workaround right now is that I spin up a 16GB VPS specifically for web3 upload.

insanity54 avatar Dec 14 '21 18:12 insanity54

We can't enlarge our host forever to remove this issue, any concrete solutions?

gulprun avatar Mar 16 '22 08:03 gulprun

Heads up folks - we're in the middle of revamping our uploads flow in a super exciting, very IPFS-y way that will be way more usable (CAR generation will be streaming-y so memory constraints way less, among many other benefits). Please stay tuned!

dchoi27 avatar Apr 06 '22 18:04 dchoi27

Love to know if this one is fixed, or any prospect?

gulprun avatar May 24 '22 04:05 gulprun

any updates?

Amanse avatar Jul 22 '22 16:07 Amanse