Eric Buehler

Results 136 issues of Eric Buehler

Refs: - https://github.com/deepseek-ai/DeepSeek-V3/blob/4cc6253d5c225e2c5fea32c54573449c1c46470a/inference/model.py#L443 - https://github.com/sgl-project/sglang/pull/905/files#diff-5b9e34dd492bd8a14702a18b594721091092276fad1cf8736fba6ef1f33c1b04 - https://github.com/InternLM/lmdeploy/pull/1621/files#diff-daef4154c2a77eba9f2e444df958cc19b318ce248c09995080b344b174522dc5

- [x] Config - [ ] Qwen2_5OmniThinkerForConditionalGeneration (text + image + audio **in**, text **out**) - [ ] Qwen2_5OmniAudioEncoder - [ ] Qwen2_5OmniVisionEncoder - [x] Qwen2_5OmniThinkerTextModel - [ ] Qwen2_5OmniTalkerForConditionalGeneration...

models

At its core, a ring-based All Reduce algorithm backend. This enables tensor parallelism for Metal users!

Removing contiguous calls in rmsnorms and RoPE might help.

Support FlashMLA for improved throughput for MLA models (DeepSeek V2, V3/R1) on CUDA. https://github.com/EricLBuehler/candle/pull/74 https://github.com/deepseek-ai/FlashMLA

- Add the whisper model