Driss Guessous issues

Results 58 issues of


                                            Driss Guessous

trafficstars

[WIP] Add FlexAttention to V1

# Summary This PR adds FlexAttention as a new unified_attention backend for the V1 engine. This requires torch 2.7+ since we fixed a number of dynamic shapes issues that show...

documentation

needs-rebase

ci/build

Stacked PRs: * __->__#88 --- --- --- hacked up ```Shell ❯ torchfix auto_deprecate.py auto_deprecate.py:8:15: TOR101 [*] Use of deprecated function torch.nn.functional.soft_margin_loss --- /home/drisspg/meta/scripts/misc/auto_deprecate.py +++ /home/drisspg/meta/scripts/misc/auto_deprecate.py @@ -3,7 +3,7 @@ import...

CLA Signed

Add mxfp8 path

Stacked PRs: * __->__#1190 --- --- --- Add mxfp8 path ```Shell with-proxy CONFIG_FILE="torchtitan/models/llama3/train_configs/llama3_8b.toml " ./run_train.sh --model.print_after_conversion --training.compile --training.steps 50 --model.converters mxfloat8 --float8.recipe_name "mxfp8" ``` ## Review highlight I wish we...

CLA Signed

Add mxfp8 path

Stacked PRs: * __->__#1189 --- --- --- Add mxfp8 path

CLA Signed

FlexAttention currently evaluates on partial blocks all values and this can lead to IMA

See https://github.com/pytorch/pytorch/issues/147551#issuecomment-2683700299

Add a way to do power of 2 scaling

Stacked PRs: * #2258 * __->__#2256 * #2253 --- --- --- Fixes: https://github.com/pytorch/ao/issues/2182 Add a way to do power of 2 scaling

CLA Signed

float8

topic: new feature

Manually specify flags if no arch set

Stacked PRs: * __->__#2219 --- --- --- Manually specify flags if no arch set

CLA Signed

topic: not user facing

Tensor Subclass + VLLM Compile

# VLLM Torch.compile Issue Tracker ## Summary This document tracks the existing issue with the way VLLM uses `torch.compile` and tensor subclasses. **TLDR**: VLLM doesn't setup `aotdispatch` correctly, causing subclass...

tracker

inference

MXFP Inference Tracking Doc

# MXFP Inference and Performance Tracking ## Summary This issue tracks performance and E2E integration of MXFP formats (MXFP8, MXFP4, NVFP4) on B200 and other devices. ## Status Overview |...

tracker

performance

topic: performance

TorchAO needs to update its build system

# Summary See: https://github.com/pypa/pip/issues/6334

triaged

build