Alex Cheema
Alex Cheema
## Describe the bug When launching an instance of any model with pipeline and RDMA, it gets stuck on WARMING UP. ## To Reproduce Steps to reproduce the behavior: 1....
- [ ] Basic model support with auto parallel with pipeline - [ ] Tensor parallel
- [ ] Basic model support (auto parallel with pipeline) - [ ] Tensor parallel
Sparkle supports a `` tag in the appcast. It supports HTML. We should include our patch notes in each release. See https://sparkle-project.org/documentation/publishing/
## Motivation Enable the runner to process multiple concurrent inference requests efficiently. Previously, requests were processed sequentially - one had to complete before the next could start. With continuous batching,...
See https://platform.openai.com/docs/api-reference/batch This is useful for evals.
## Motivation Add support for Claude Messages API and OpenAI Responses API to allow users to interact with exo using these popular API formats. This enables broader compatibility with existing...
## Motivation Adds uncertainty visualization to the chat interface, allowing users to see token-level confidence scores and regenerate responses from any point in the generation. This enables users to: -...
## Motivation Users processing long prompts have no visibility into when token generation will start. This feature adds a progress bar showing prefill progress, giving users real-time feedback during prompt...
## Motivation For thinking models like GLM-4.7, the `` tag is inserted by the tokenizer's `apply_chat_template()` into the **prompt** (input). The model generates tokens starting *after* this tag, so ``...