openvino.genai
openvino.genai copied to clipboard
[Good First Issue]: Verify mpt-7b-chat with GenAI text_generation
Context
This task regards enabling tests for mpt-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.
Please ask general questions in the main issue at https://github.com/openvinotoolkit/openvino.genai/issues/259
What needs to be done?
Described in the main Discussion issue at: https://github.com/openvinotoolkit/openvino.genai/issues/259
Example Pull Requests
Described in the main Discussion issue at: https://github.com/openvinotoolkit/openvino.genai/issues/259
Resources
- Contribution guide - start here!
- Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
Contact points
Described in the main Discussion issue at: https://github.com/openvinotoolkit/openvino.genai/issues/259
Ticket
No response
.take
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
.take
Thanks for being interested in this issue. It looks like this ticket is already assigned to a contributor. Please communicate with the assigned contributor to confirm the status of the issue.
Hello @qxprakash, are you still working on this? Is there anything we could help you with?
hello @p-wysocki yes I am working on it , I faced error while trying to run mpt-chat
python3 ../../../llm_bench/python/convert.py --model_id mosaicml/mpt-7b-chat --output_dir ./MPT_CHAT --precision FP16
[ INFO ] Removing bias from module=LPLayerNorm((4096,), eps=1e-05, elementwise_affine=True).
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:02<00:00, 31.27s/it]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 121/121 [00:00<00:00, 781kB/s]
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:87: UserWarning: Propagating key_padding_mask to the attention module and applying it within the attention module can cause unnecessary computation/memory usage. Consider integrating into attn_bias once and passing that to each attention module instead.
warnings.warn('Propagating key_padding_mask to the attention module ' + 'and applying it within the attention module can cause ' + 'unnecessary computation/memory usage. Consider integrating ' + 'into attn_bias once and passing that to each attention ' + 'module instead.')
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/modeling_mpt.py:311: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert S <= self.config.max_seq_len, f'Cannot forward input with seq_len={S}, this model only supports seq_len<={self.config.max_seq_len}'
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/modeling_mpt.py:253: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
_s_k = max(0, attn_bias.size(-1) - s_k)
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:78: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
_s_q = max(0, attn_bias.size(2) - s_q)
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:79: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
_s_k = max(0, attn_bias.size(3) - s_k)
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:81: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_bias.size(-1) != 1 and attn_bias.size(-1) != s_k or (attn_bias.size(-2) != 1 and attn_bias.size(-2) != s_q):
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:89: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if is_causal and (not q.size(2) == 1):
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:90: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
s = max(s_q, s_k)
[ WARNING ] Failed to send event with the following error: <urlopen error EOF occurred in violation of protocol (_ssl.c:2426)>
[ WARNING ] Failed to send event with the following error: <urlopen error EOF occurred in violation of protocol (_ssl.c:2426)>```
cc @pavel-esir
@pavel-esir I hope this is not a memory issue ?
@qxprakash thanks for your update. I see only connections warning in your logs. Did you get the converted IR? If so, did you run it with c++ sample?
@qxprakash what is the progress ?
Hello @qxprakash, are you still working on that issue? Do you need any help?
Hi @p-wysocki I was stuck , currently I'm not working on it.
.take
Thanks for being interested in this issue. It looks like this ticket is already assigned to a contributor. Please communicate with the assigned contributor to confirm the status of the issue.
@p-wysocki I'd like to take upon this issue as well.
I reassigned to @rk119 because I'm not sure if @Utkarsh-2002 is still here. If you still want to work on that, let us know. If @rk119 confirms that they didn't start working on it by the time @Utkarsh-2002 replies, @Utkarsh-2002 can take the task.
Hi @Wovchena,
Before raising a PR, I want to address that for samples prompt_lookup_decoding_lm and speculative_decoding_lm, I am getting an error of Exception from src/inference/src/cpp/infer_request.cpp:193: Check '::getPort(port, name, {_impl->get_inputs(), _impl->get_outputs()})' failed at src/inference/src/cpp/infer_request.cpp:193: Port for tensor name position_ids was not found.
.take
Thank you for looking into this issue! Please let us know if you have any questions or require any help.