pyreft icon indicating copy to clipboard operation
pyreft copied to clipboard

[P1] Compatibility with tooling that expects a HF transformer model

Open chris-aeviator opened this issue 1 year ago • 3 comments

I'm raising the issue that in terms of "production readyness" (statet goal) pyreft, designed as a very thoughtful library, will need to work together with tooling that expects a loadable vanilla transformer model. A real world reproducible example is loading a pyvene trained model with https://github.com/outlines-dev/outlines in order to create structured json/ schema following outputs.

While the model can be accessed via pyref_model.model - it is not loadable, and in any case one tool would miss the other's functionality when loaded this way. What would be a advisable strategy to integrate with other tooling? May I suggest also different backend engines (e.g. vllm, ollama, llama.cpp) will need to have have interfaces to pyreft. Maybe I'm overseeing some documentation here but I'm unsure how to proceed.

Is merging a pyvene intervention into the base model possible or is pyvene/pyreft more of an active component that will require code changes in any case?

chris-aeviator avatar Apr 08 '24 12:04 chris-aeviator

Hey! So:

  1. We got similar questions on Twitter about accelerating inference with different backends (vllm, mlx, etc.) Currently, pyvene is a major dependency for which no alternative exists: it manages the torch hooks that are used to intervene on hidden representations at the token-level in pyreft. To enable support for non-HF and/or non-torch models, we would need to replicate some pyvene functionality. We have thought about how to do this simply without needing to port pyvene entirely[^thoughts], but it's a long-term software engineering task that we don't immediately have the time/resources/people for. Maybe in the summer once pyreft is known to be stable for a variety of models + tasks, we will invest time into this.
  2. The LoReFT intervention can't be merged into the base model for two reasons. (1) It is a complex function applied directly to the hidden state, so it operates differently than existing model components (which add to the hidden state via residuals) and so can't be folded into them as far as we can tell. (2) It operates only on some tokens, not all, but model weights are the same for every token.

So overall, using LoReFT in a model requires either torch-style hooking functionality or code changes to the model to support token-level interventions.

[^thoughts]: E.g. we could just load pyvene for the KV-cache population when processing the prompt, and then use the efficient backend for generation. But in the future, we want to support intervention on decoding steps as well which is messier.

aryamanarora avatar Apr 08 '24 21:04 aryamanarora

assigning with P1 since there is no blocker.

frankaging avatar Apr 08 '24 21:04 frankaging

an elegant solution could be providing an import AutoModel from pyreft that encapsulates the hooks while preserving compatibility with other libraries. Is this on a high level possible? If so, I'd be willing to contribute , my interest here lies also in supporting high troughput vllm and per request model switching, both possible with vllm already. They just loads a HF AutoModel in the end.

chris-aeviator avatar Apr 10 '24 20:04 chris-aeviator