AllentDan

Results 21 issues of AllentDan

Open http://xxxx:23333/metrics/ to view the metrics.

enhancement

## Motivation - Decoupling dialogue templates from the inference engine. - Reduce the barrier to adding new dialogue templates. - Remove `model_name` from EngineConfig to avoid redundant specification. - Support...

RFC

[LMDeploy](https://github.com/InternLM/lmdeploy), as an AI deployment platform supporting multiple backend services, has always been committed to providing fast and stable AI model deployment services. Now, it supports accelerating the inference and...

[This lambda expression](https://github.com/CoffeeBeforeArch/mmul/blob/c624ef730ef0b14ad040d0444e4c4af5f1e60fab/src/baseline/benchmark.cpp#L109) pushed to the vector. Only part of the threads is really executed.

Hi, Nick. I was confused that you used [num_threads - 1](https://github.com/CoffeeBeforeArch/mmul/blob/c624ef730ef0b14ad040d0444e4c4af5f1e60fab/src/baseline/benchmark.cpp#L106) instead of num_threads.

- [x] deepseek vl - [x] llava - [x] internvl - [x] xcomposer (did not quant plora) - [x] minigemini - [x] yi - [x] qwen - [x] internvl-llava

enhancement