langchain
langchain copied to clipboard
loading 8 bit models and throwing device_map = auto errors
System Info
Hi @hwchase17 @agola11
In the langchain v0.0.171 there is not feature to load 8 bit models because specifying it requires device_map = auto
to be set which I am unable to set in the HuggingFacePipeline
For clarity I am trying to load 8 bit model in order to save memory and load model faster, if that's achievable
---> 64 task="text-generation", model_kwargs={"temperature":0, "max_new_tokens":256, "load_in_8bit": True, device_map:'auto'})
66 chain = LLMChain(llm=llm, prompt=PROMPT, output_key="nda_1", verbose=True)
68 prompt_template = """<my prompt>:
69 Context: {nda_1}
70 NDA:"""
NameError: name 'device_map' is not defined
Who can help?
@hwchase17 @agola11
Information
- [ ] The official example notebooks/scripts
- [ ] My own modified scripts
Related Components
- [X] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
---> 64 task="text-generation", model_kwargs={"temperature":0, "max_new_tokens":256, "load_in_8bit": True, device_map:'auto'})
66 chain = LLMChain(llm=llm, prompt=PROMPT, output_key="nda_1", verbose=True)
68 prompt_template = """<my prompt>:
69 Context: {nda_1}
70 NDA:"""
NameError: name 'device_map' is not defined
Expected behavior
model must load faster
Not 100% sure if this is the intended solution, but at least this should give the result you are after.
You first load the model using the Huggingface Transformers library. Here you can set the parameters:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-7b1")
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-7b1", device_map="auto", load_in_8bit=True)
Then you create a HF transformers pipeline
from transformers import pipeline
bloom_pipeline = pipeline(
task="text-generation",
model=model,
temperature=0,
max_length= 256,
tokenizer=tokenizer)
Now you create the LLM in Langchain using the HuggingFacePipeline:
from langchain.llms import HuggingFacePipeline
llm_bloom = HuggingFacePipeline(pipeline=bloom_pipeline)