B2_Command_Line_Tool icon indicating copy to clipboard operation
B2_Command_Line_Tool copied to clipboard

stdin for b2 upload_file

Open olcto opened this issue 8 years ago • 13 comments

#152

Not ready for merge yet:

  • [ ] Needs to be reviewed
  • [ ] Create unit tests

olcto avatar May 16 '16 20:05 olcto

Please run the pre-commit.sh script. It reports some errors and applies some fixes on its own.

ppolewicz avatar May 16 '16 21:05 ppolewicz

@olcto: could you please run the pre-commit script again?

ppolewicz avatar Jul 05 '16 16:07 ppolewicz

@olcto: If you have time to revisit this, I'd love to see this merged!

jmealo avatar Apr 29 '17 14:04 jmealo

Does anybody want to finish up this work?

I'm cleaning up old pull requests, and will close in a few days if nothing happens.

bwbeach avatar Nov 01 '17 23:11 bwbeach

Are there any plans to integrate this? Using a stream to pipe a v.large file (500GB+) is a common use-case in AWS/GCP and its omission from the B2 CLI is the only thing stopping us from using B2 more widely at the moment.

bb8 avatar Mar 06 '21 10:03 bb8

@bb8 that would be an interesting use case for many reasons. Actually I think we've done some work recently that would make it easier to upload large files like that. I'll look into that to see how much work would this need, given that we have more infrastructure for it now.

By the way, could you say how many files you have and what are they? Is it video or raw VM images or something like that?

ppolewicz avatar Mar 06 '21 10:03 ppolewicz

I suspect you might be dealing with sensor data. Is it compressed? How fast is it streaming?

ppolewicz avatar Mar 06 '21 10:03 ppolewicz

@ppolewicz in our particular case, they are compressed VM snapshots, at a daily frequency. The resulting files are often larger than the remaining available space on the host, so it's not feasible to spool the entire file to disk or memory beforehand.

bb8 avatar Mar 06 '21 17:03 bb8

@bb8 is there any temporary storage that could be used to cache fragments for uploading? I you have some space (10-20GB would be ideal, but 2GB will be decent and even 200-300MB will be much faster), performance and reliability of the upload operation could be greatly improved. If you have no space whatsoever and want to upload it without any temporary files, a network connection hiccup would break the transfer and since we don't have the data written anywhere, the entire operation would fail. We could also use memory. Also, please tell me, do you upload a snapshot of the exact same machine every day? We may have another improvement in mind.

ppolewicz avatar Mar 06 '21 22:03 ppolewicz

I'm thinking more about it and I have a couple more questions. What are the VMs hosting? Is it a big sql database, or some sort of more static type of content? Also, if there would be a compelling reason to do it, would it be possible to make the stream non-compressed so that b2 cli would take care of the compression?

ppolewicz avatar Mar 06 '21 22:03 ppolewicz

@ppolewicz The VMs are varied, and all backups are encrypted and compressed with GPG before transfer. In our case, the host server always has at least 50GB available space to allow for smooth operation, but that may not be the case for others. We currently use the AWS CLI for streaming backups to S3, and it works great, but having the option to use B2 would be advantageous.

bb8 avatar Mar 07 '21 11:03 bb8

Nilay from Backblaze here. While you are waiting for this issue to be implemented, just wanted to make certain you are aware that Backblaze B2 supports the AWS CLI via B2’s S3 compatible API. You should be able to just add a few configuration items to switch a job from AWS S3 to Backblaze B2. Details can be found here: https://help.backblaze.com/hc/en-us/articles/360047779633-Configuring-the-AWS-CLI-for-use-with-B2?mobile_site=true

nilayp avatar Mar 07 '21 15:03 nilayp

@bb8 I understand. It makes a lot of sense, actually! Thank you for sharing your usage pattern. I'm thinking a few steps ahead and I'd like to understand whether a solution I have in mind would be viable in your case.

If b2cli could see the unencrypted uncompressed data, it could potentially use the information found within the uncompressed stream to avoid re-transfering the data to the B2 server if it is already there (because the operating system part of the VM snapshot is probably the same today as it was yesterday). I understand that you need encryption, compression and gpg signatures, so in the future, b2cli could take care of those for you. This would be the same operation that you do today, except the order would be a bit different. It would reduce network transfer, CPU usage and backup duration for your machines though, so maybe it's worth to consider it.

Does it seem potentially viable or would it not work for some reason (that I am not aware of)?

ppolewicz avatar Mar 07 '21 23:03 ppolewicz

stdin (or from other FIFO streams) upload support has been since implemented in both b2sdk&CLI. Thank you for your support & showing this was needed.

mjurbanski-reef avatar Nov 16 '23 12:11 mjurbanski-reef