feat(iroh-bytes): Batch blob api
Description
This adds a new API for creating blobs.
You create a batch. Within a batch you got all the usual operations to add stuff, like add_bytes, add_stream, add_from_path etc. Notable differences to the existing api:
- All operations work on individual blobs, so no way to add entire subdirectories with add_from_path
- All operations return a TempTag instead of having a set tag option
The way to use the API is to just perform a complex operation within a batch, and then at the end assign a (non temporary) tag to the root(s) of the created data before dropping the batch.
It is possible to scan a directory and create a collection purely on the client side, so the code to traverse a directory can be removed from the node.
To allow the workflow described above, the tags client has been extended to allow manually setting a tag.
Ideally this API would entirely replace the current blobs API, so all read ops would always happen within the context of a batch.
Breaking Changes
At this point mostly adding stuff, but changing the RPC api for setting tags. I might decide to remove the entire non batch mutation API in this PR.
How it works is to leave a streaming RPC call open for each batch, then do operations in the context of an unique identifier for this RPC call.
Notes & open questions
Note: if things work out every add operation refers to a single blob, and the aggregation of many blobs can be driven from the client. That means that a lot of the complexity of the progress events like ids etc. can be removed from the rpc. This still needs to exist, but can be confined to the client.
Todo
- [ ] ~~Add back fine grained progress~~ fine grained progress is not needed for all ops, but definitely for add_file and add_dir. Possibly for add_bytes and add_reader. Not sure if it is OK to have it in all cases despite it typically not being used.
- [ ] ~~Purge all tag setting stuff from the blobs API and the downloader.~~
I would propose that we merge this initially as an addition, and do the stripdown of the other APIs in a subsequent PR. Also, if this is an addition we can do the fine grained progress for add_dir in a subsequent PR as well.
Change checklist
- [x] Self-review.
- [x] Documentation updates if relevant.
- [x] Tests if relevant.
- [x] All breaking changes documented.
Here is the preliminary API for batches:
pub async fn add_bytes(&self, bytes: impl Into<Bytes>, format: BlobFormat) -> Result<TempTag> {
pub async fn add_file(&self, path: PathBuf, import_mode: ImportMode, format: BlobFormat) -> Result<(TempTag, u64)> {
pub async fn add_dir(&self, root: PathBuf, import_mode: ImportMode, wrap: WrapOption) -> Result<TempTag> {
pub async fn add_collection(&self, collection: Collection) -> Result<TempTag> {
pub async fn add_stream(&self, mut input: impl Stream<Item = io::Result<Bytes>> + Send + Unpin + 'static, format: BlobFormat) -> Result<TempTag> {
pub async fn add_blob_seq(&self, iter: impl Iterator<Item = Bytes>) -> Result<TempTag> {
pub async fn temp_tag(&self, content: HashAndFormat) -> Result<TempTag> {
Basically very similar to the normal blobs api, but there are no options to create tags. Instead every fn returns a temp tag for the thing that has been created, that the user can then later assign to a permanent tag (or not).
The tags API has been extended to allow creating a tag given a hash and format.
Many of these functions are convenience functions. Probably most notably, add_dir is now traversing the file system on the client side and doing multiple add_file calls.
should there be the equivalent delete versions as well`
should there be the equivalent
deleteversions as well`
WDYM? You delete stuff by ensuring that it is no longer tagged in some way, then GC will take care of it. There is blob delete, but that is really a low level function that you should rarely use directly.
delete_blob will just do it's thing no matter what temp tags there are, so it can live in the blobs API not in the batch API.
@rklaehn what's the state of this?
@rklaehn what's the state of this?
Just merged with main. I want to do another self-review, but currently trying to keep all the stuff up to date with main so it does not bitrot...
closing in favor of #2545