lullabies777

Results 7 issues of lullabies777

Gemma 7B: https://huggingface.co/google/gemma-7b Gemma 2B: https://huggingface.co/google/gemma-2b Blog: https://blog.google/technology/developers/gemma-open-models/ Paper: https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf

I found that the scripts in GEMMA do not support GEMMA2. Is there any plan to add support for GEMMA2?

feature request
new model

I quantized the Qwen2-0.5B model, which is approximately 800M. However, I will need 6GB GPU memory for inference, likely due to the KV cache. Can I disable the KV cache...

duplicate

### Checklist - [X] 1. I have searched related issues but cannot get the expected help. - [X] 2. The bug has not been fixed in the latest version. -...

按照https://modelscope-agent.readthedocs.io/en/stable/llms/qwen2_tool_calling.html里面的教程,用vllm起了qwen2-7b,提示需要在vllm启动命令加--enable-auto-tool-choice, 但是启动--enable-auto-tool-choice需要配套的--tool-call-parser,目前vllm只有mistral和hermes,这个怎么实现? \"auto\" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set

Work in Progress

I tried running Nemo-12b 4-bit model on one T4 GPU, but the inference speed is very slow. Additionally, the 'forward' function takes much longer than 'generate'. Is there a speedup...

### Your current environment INFO 06-20 10:55:55 [__init__.py:244] Automatically detected platform cuda. Collecting environment information... ============================== System Info ============================== OS : Ubuntu 22.04.5 LTS (x86_64) GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04)...

bug