langchain icon indicating copy to clipboard operation
langchain copied to clipboard

loading 8 bit models and throwing device_map = auto errors

Open zubair-ahmed-ai opened this issue 1 year ago • 1 comments

System Info

Hi @hwchase17 @agola11

In the langchain v0.0.171 there is not feature to load 8 bit models because specifying it requires device_map = auto to be set which I am unable to set in the HuggingFacePipeline

For clarity I am trying to load 8 bit model in order to save memory and load model faster, if that's achievable

---> 64 task="text-generation", model_kwargs={"temperature":0, "max_new_tokens":256, "load_in_8bit": True, device_map:'auto'})
     66 chain = LLMChain(llm=llm, prompt=PROMPT, output_key="nda_1", verbose=True)
     68 prompt_template = """<my prompt>:
     69     Context: {nda_1}        
     70     NDA:"""

NameError: name 'device_map' is not defined

Who can help?

@hwchase17 @agola11

Information

  • [ ] The official example notebooks/scripts
  • [ ] My own modified scripts

Related Components

  • [X] LLMs/Chat Models
  • [ ] Embedding Models
  • [ ] Prompts / Prompt Templates / Prompt Selectors
  • [ ] Output Parsers
  • [ ] Document Loaders
  • [ ] Vector Stores / Retrievers
  • [ ] Memory
  • [ ] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [ ] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction

---> 64 task="text-generation", model_kwargs={"temperature":0, "max_new_tokens":256, "load_in_8bit": True, device_map:'auto'})
     66 chain = LLMChain(llm=llm, prompt=PROMPT, output_key="nda_1", verbose=True)
     68 prompt_template = """<my prompt>:
     69     Context: {nda_1}        
     70     NDA:"""

NameError: name 'device_map' is not defined

Expected behavior

model must load faster

zubair-ahmed-ai avatar May 18 '23 03:05 zubair-ahmed-ai

Not 100% sure if this is the intended solution, but at least this should give the result you are after.

You first load the model using the Huggingface Transformers library. Here you can set the parameters:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-7b1")
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-7b1",  device_map="auto", load_in_8bit=True)

Then you create a HF transformers pipeline

from transformers import pipeline

bloom_pipeline =  pipeline(
                    task="text-generation",
                    model=model,
                    temperature=0,
                    max_length= 256,
                    tokenizer=tokenizer)

Now you create the LLM in Langchain using the HuggingFacePipeline:

from langchain.llms import HuggingFacePipeline

llm_bloom = HuggingFacePipeline(pipeline=bloom_pipeline)

jwnelen avatar May 22 '23 12:05 jwnelen