William Lin
William Lin
Hi, Will the driver for AXI HBICAP be added to this repo? If not, can someone please point to where I can find it. I'm using vitis+vivado to work with...
Hi, I'm wondering if the TFLOPs/MFU numbers in table 5 of the paper is using activation checkpointing? I've looked through the MS-AMP-Examples repo and it seems like GPT3 megatron does...
Adds initial multi step scheduling support to vLLM. RFC: https://github.com/vllm-project/vllm/issues/6854 **Current Status**: **8/8: multi-node working** 8/6: PP+TP working; PP+ray fixed; ~~a few single GPU perf regressions (easy fix)~~ 8/2 PP...
GPUExecutor has a different API and does not define a `_run_workers`. Another way to fix this would be define the `_run_workers` (it would only call the driver_worker) api in `GPUExecutor`...
@WoosukKwon FILL IN THE PR DESCRIPTION HERE FIX #xxxx (*link existing issues this PR will resolve*) **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** ---...
I don't have AMD GPUs and cannot test locally. We can also considering moving the `advance_step` inside flash_attn.py and rocm_flash_attn.py to `AttentionMetadata` as a default implementation since the code is...
**Why these changes are needed** -- We have created a set of custom ComfyUI nodes around [FastVideo](https://github.com/hao-ai-lab/FastVideo), a framework for multi-GPU video generation using sequence parallelism. This fixes `import` failures...
### Motivation Contritbutions are welcome! ## Focus - Diverse model support with training and inference - Launch RL training infra and an effective training recipe - New Distillation Recipes -...
## Description This PR adds a dummy `.result()` API to DeploymentResponseGenerator`. `DeploymentResponseGenerator` currently doesn't support `.result()`, however when calling `.remote()` on a DeploymentHandle, the return type is a Union of...