ezkl icon indicating copy to clipboard operation
ezkl copied to clipboard

Huggingface integration

Open ivelin opened this issue 1 year ago • 7 comments

This maybe a very big stretch goal given current model size and ops limitations, but inspirationally worth considering. Could be a big unlock of applications on ezkl, if it is simple enough for app devs to to plug ezkl into the huggingface transformer APIs and demo spaces.

ivelin avatar Apr 04 '23 11:04 ivelin

This is a great idea! What kind of integration are you looking for? Maybe we could do some automatic downloading and conversion of models based on their huggingface identifiers?

jasonmorton avatar Apr 05 '23 13:04 jasonmorton

For example being able to inject ezkl step in an inference pipeline that outputs a proof that huggingface model XYZ located at repo "mymodels/xyz" with git revision "abcd" and corresponding hash "efgh" ran on some private data input and produced public output "ijk".

A specific use case that comes to mind is being able to run a fine tuned DocVQA model on utility bills for KYC verification. Confirm that the secret input PDF is a valid (non-fake) recent utility bill from a US based zip code and belongs to Bob Hackman.

ivelin avatar Apr 05 '23 15:04 ivelin

That makes sense. @JSeam2 is working on some of the prerequisites for this.

jasonmorton avatar Apr 05 '23 15:04 jasonmorton

One possibility right now is to manually extract the onnx files and obtain the outputs from the hugging face models. The challenge is that the compute requirements for huggingface models might require a distributed setup to run.

If the model is small enough, a work around is to use python subprocess to run ezkl and use the built ezkl program within python, however, it's still fairly clunky. I used that approach for a hackathon and created a python server for a similar purpose. I have written some code regarding the subprocess approach that could be useful for you. https://github.com/JSeam2/zkml-server/blob/main/app.py

At the moment I'm still working on bindings with pyo3 on a fork of ezkl https://github.com/jseam2/ezkl/tree/python. When this is ready for production the goal is to merge the changes into the main repo and expose the bindings to the pyezkl repo which would offer more expressivity within python. The rust bindings are limited to pep387 compatibility so the feature set used for the rust bindings would be minimal.

JSeam2 avatar Apr 05 '23 19:04 JSeam2

Hi @JSeam2, could I extend the comment by @ivelin and ask if it's possible to embed metadata into the compiled ezkl such as the points he suggested or arbitrary length cryptographic commitments to e.g. training dataset?

rmlearney-digicatapult avatar Aug 01 '24 13:08 rmlearney-digicatapult

@rmlearney-digicatapult that's a very interesting idea. We could add a metadata field like this and leave it to the user to decide how to use it initially.

jasonmorton avatar Aug 01 '24 14:08 jasonmorton

🙏🙂

rmlearney-digicatapult avatar Aug 01 '24 15:08 rmlearney-digicatapult