hf-hub icon indicating copy to clipboard operation
hf-hub copied to clipboard

add uploading support to the library!

Open arilotter opened this issue 1 year ago • 1 comments

This is an absolutely massive set of changes, so while this PR is far from ready, I thought I'd share my work so far to begin the process of upstreaming & discussing changes early.

This adds ~2k lines of code, more than quadrupling the repo size.

This PR currently only has an implementation for the tokio / reqwests side of things, and does not touch the sync/ureq code.

It's also quite messy & has a bunch of copy-pasted comments from the Python huggingface_hub codebase.

It's a near-direct translation of the Python code, but with a whooole bunch of structs that I best-guess translated over, since I couldn't find any public documentation for these APIs.

implemented & works:

  • small files
  • git lfs files
  • git lfs multipart files
  • model repos

not yet implemented:

  • dataset repos, spaces
  • file copies
  • file deletions
  • folder uploads
  • repo management- create, delete, etc
  • much of anything else

arilotter avatar Oct 28 '24 17:10 arilotter

We don't want to copy the python code here. This is Rust-land, the goal is to stick to the simplest possible thing.

For instance from_str is very unrusty. Having real enum makes everything simpler, user cannot even EXPRESS the wrong thing since the enum explicitly states what's valid, which is not the case of &str. Adding that layer is simply adding potential for errors. remove it.

Overall this PR is not really reviewable as it touches things EVERYWHERE, without focusing a making the simplest possible modification.

Also remove all the new dependencies, we simply do not want them, they are not necessary. Taking on dependency is a cost. Staying simple has strong benefits. Maybe the shas are necessary but then users should be able to opt-out of those.

Also remove all forms of environment dependency, we also explicitly removed every form of them, with the exception of HF_ENDPOINT and HF_HOME which are ONLY every read in from_env functions (again explicit user opting-in for those, reading environment in libraries is really a source of many issues)

You should check out this https://github.com/huggingface/hf_transfer/blob/main/src/lib.rs#L452 as it's implementing upload is only several hundred lines.

Narsil avatar Jan 07 '25 10:01 Narsil