nft.storage-tools icon indicating copy to clipboard operation
nft.storage-tools copied to clipboard

Feature Tool: upload 100s of GBs (CLI tool)

Open bmann opened this issue 4 years ago • 8 comments

We've tried a number of different ways to get the 100GB of songs that are on @songadaymann's computer uploaded / added to IPFS.

The Upload as CAR file script looks closest to what we need.

The rough pseudo code would be:

  • can be run from the root of the songs folder
  • upload each song using the CAR file script or similar
  • loop through all of them to ensure they have been uploaded

As a check to see if it already exists in NFT storage (if the script needs to be restarted / run multiple times), we likely want to do something like:

  • make a CAR file, get the CID
  • make a call to NFT storage to see if that CID is already pinned/uploaded
  • if pinned, skip

@songadaymann has also made all the songs available on Dropbox, if anyone would like to give this a try :)

bmann avatar Dec 14 '21 05:12 bmann

Posting here in the spirit of shared tooling, we'll share anything we come up with in this thread / are happy to contribute if others find it helpful.

bmann avatar Dec 14 '21 05:12 bmann

Yes I think a script to do this would be super helpful! Our team would love to help out if you are able to submit a PR. Just out of curiosity, what's the reason for looping across files and using storeCar rather than looping over files and just using storeBlob, or using storeDirectory for larger directories on files? Guessing you might have been running into some errors using storeDirectory for really large directories.

Separately, we're thinking about large uploads holistically (tracking in this issue https://github.com/nftstorage/nft.storage/issues/928) - making the UX and toolchains better. The case you're flagging here (looping over individual files) isn't directly covered by the issue, but it's definitely something we'll need!

dchoi27 avatar Dec 14 '21 21:12 dchoi27

@dchoi27 this is my pseudo code / scope so no idea why we would use one method over the other.

I will say that there are 4000+ files and they are named sequentially and we need to make sure all of them are uploaded over a home 100Mbps connection. Looping over them will likely need to be done multiple times to ensure that all files are correctly uploaded. Suggestions welcome.

By creating a CAR locally and then checking if the CID exists in NFT storage -- we avoid having to create a place to store whether or not a file exists.

bmann avatar Dec 15 '21 02:12 bmann

Let’s create a little library/cli too that:

  • Takes a directory as input
  • Sends each file individually to NFT.Storage
    • bonus points: progress bar
    • bonus points: batch >100MB of individual files in CARs at a time
    • bonus points: concurrent uploads (if there’s a gain here, once you factor in rate limiting it might not)

mikeal avatar Dec 15 '21 21:12 mikeal

yes! in addition, we can give the option for the directory to be split into smaller directories (so file names can be preserved).

the code can be reused for https://github.com/nftstorage/nft.storage/issues/981 as well

dchoi27 avatar Dec 16 '21 01:12 dchoi27

Hey! I'm new to this project (though I've messed with IPFS before). I was thinking this might be a good thing for me to work on as I spin up. Mind if I take a stab at it?

redaphid avatar Dec 16 '21 19:12 redaphid

Yup, you have a green light @redaphid to dig into this.

JeffLowe avatar Dec 16 '21 20:12 JeffLowe

@redaphid @jchris FYI - someone made this! https://github.com/factoria-org/nftp

dchoi27 avatar Dec 28 '21 16:12 dchoi27