[Bug] mpt-30b-chat answering questions based on langchain does not work

Open gzusgw opened this issue 2 years ago • 0 comments

Is this a new bug?

[X] I believe this is a new bug
[X] I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

run the code examples/learn/generation/llm-field-guide/mpt/mpt-30b-chatbot.ipynb : res = generate_text("Explain to me the difference between nuclear fission and fusion.") print(res[0]["generated_text"]) it only returns: Explain to me the difference between nuclear fission and fusion. without model's answer

Expected Behavior

The answer to the model can be returned

Steps To Reproduce

import torch import transformers from transformers import StoppingCriteria, StoppingCriteriaList from torch import cuda, bfloat16

device = f'cuda:0' if cuda.is_available() else 'cpu'

model = transformers.AutoModelForCausalLM.from_pretrained( 'mosaicml/mpt-30b-chat', trust_remote_code=True, load_in_8bit=True, # this requires the bitsandbytes library max_seq_len=8192, init_device=device, device_map="auto" ) model.eval() #model.to(device) print(f"Model loaded on {device}")

tokenizer = transformers.AutoTokenizer.from_pretrained("mosaicml/mpt-30b-chat")

stop_token_ids = [ tokenizer.convert_tokens_to_ids(x) for x in [ ['Human', ':'], ['AI', ':'] ] ]

#define custom stopping criteria object class StopOnTokens(StoppingCriteria): def call(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool: for stop_ids in stop_token_ids: if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all(): return True return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])

stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]

generate_text = transformers.pipeline( model=model, tokenizer=tokenizer, return_full_text=True, # langchain expects the full text task='text-generation', # we pass model parameters here too stopping_criteria=stopping_criteria, # without this model rambles during chat temperature=0.1, # 'randomness' of outputs, 0.0 is the min and 1.0 the max top_p=0.15, # select from top tokens whose probability add up to 15% top_k=0, # select from top 0 tokens (because zero, relies on top_p) max_new_tokens=128, # mex number of tokens to generate in the output repetition_penalty=1.1 # without this output begins repeating )

res = generate_text("Explain to me the difference between nuclear fission and fusion.") print(res[0]["generated_text"])

Relevant log output

[2023-08-30 17:16:12,664] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-08-30 17:16:13.203919: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Instantiating an MPTForCausalLM model from /root/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-chat/54f33278a04aa4e612bca482b82f801ab658e890/modeling_mpt.py
You are using config.init_device='cuda:0', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards: 100%|█████████████████████████████████████| 7/7 [01:07<00:00,  9.62s/it]
Model loaded on cuda:0
The model 'MPTForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py:1259: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Explain to me the difference between nuclear fission and fusion.

Environment

- **OS**: ubuntu20.04
- **Language version**:  Python 3.8.16
- **Pinecone client version**:  not use

Additional Context

No response

Aug 30 '23 09:08 gzusgw