Alex Cheema

Results 117 issues of Alex Cheema

## Describe the bug When launching an instance of any model with pipeline and RDMA, it gets stuck on WARMING UP. ## To Reproduce Steps to reproduce the behavior: 1....

bug

- [ ] Basic model support with auto parallel with pipeline - [ ] Tensor parallel

enhancement

- [ ] Basic model support (auto parallel with pipeline) - [ ] Tensor parallel

enhancement

Sparkle supports a `` tag in the appcast. It supports HTML. We should include our patch notes in each release. See https://sparkle-project.org/documentation/publishing/

enhancement

## Motivation Enable the runner to process multiple concurrent inference requests efficiently. Previously, requests were processed sequentially - one had to complete before the next could start. With continuous batching,...

See https://platform.openai.com/docs/api-reference/batch This is useful for evals.

enhancement

## Motivation Add support for Claude Messages API and OpenAI Responses API to allow users to interact with exo using these popular API formats. This enables broader compatibility with existing...

## Motivation Adds uncertainty visualization to the chat interface, allowing users to see token-level confidence scores and regenerate responses from any point in the generation. This enables users to: -...

## Motivation Users processing long prompts have no visibility into when token generation will start. This feature adds a progress bar showing prefill progress, giving users real-time feedback during prompt...

## Motivation For thinking models like GLM-4.7, the `` tag is inserted by the tokenizer's `apply_chat_template()` into the **prompt** (input). The model generates tokens starting *after* this tag, so ``...