langchain
langchain copied to clipboard
HuggingFaceModel using direct generate/decode calls
Provides greater control over generation and passes lists rather than single strings to HF transformer for better GPU utilization (2X in my case) when running models locally
Unfortunately resolving the linting issues is beyond me. I am not even sure that the approach I have used (overriding _call in the BaseLLM) is even allowed. However it "works for me" and the GPU utilization improvement is significant for large batch jobs that I thought it might be interesting to others. If anyone can help 'bash this into shape' that would be appreciated.
@seanaedmiston Tangentally related question/request - Could you possibly post a repo with a toy example of using a local LLM so the less sophisticated of us could take advantage of this? The langchain docs are pretty light on details regarding self-hosted LLM's, and having a simple working example would go a long way toward helping those of us for whom Python is not our primary language.
@tensiondriven Just saw your comment. I think the docs have been improved in this regard - but actually it is deceptively simple... HF transformers allows a relative pathname for a model name. So for example if you load a model called google/flan-t5-base that will load from the hf hub. But if you load a model called 'my_dir/to/my_model' and the model files are in that directory - then it will load from there.
Wow, delightful. As a human being, I appreciate it!
apologies for my ignorance, what's difference between this and HuggingFacePipeline? is it that it uses self.model.generate on the underlying model to batch generate?
So yes, this calls 'generate' and then 'decode' separately rather letting the pipeline do it. This is handy if you want to try different decoding strategies.
The other issue is performance. IF you are using a local model and have a bunch of different prompts, it is SIGNIFICANTLY faster to pass them as a list to HF transformers rather than one by one. (i.e. in my proposed HuggingFace_Model, the 'prompts' input is a list of strings rather than just a string as in the HF pipeline). This particular improvement could be added to the HuggingFacePipeline class pretty easily since pretty much all of the HF Transformer methods support 'str or List[str]'.
@seanaedmiston Hi , could you, please, resolve the merging issues and address the last comments (if needed)? After that ping me and I push this PR for the review. Thanks!
If this PR is not needed anymore, could you, please, let me know?
I do not have the capacity at the moment to tidy this up. I still think the idea is useful, but for now we have just moved the functionality out of the langchain library, so this PR isn't really needed atm.
Closing.