Driss Guessous
Driss Guessous
# Summary This PR adds FlexAttention as a new unified_attention backend for the V1 engine. This requires torch 2.7+ since we fixed a number of dynamic shapes issues that show...
Stacked PRs: * __->__#88 --- --- --- hacked up ```Shell ❯ torchfix auto_deprecate.py auto_deprecate.py:8:15: TOR101 [*] Use of deprecated function torch.nn.functional.soft_margin_loss --- /home/drisspg/meta/scripts/misc/auto_deprecate.py +++ /home/drisspg/meta/scripts/misc/auto_deprecate.py @@ -3,7 +3,7 @@ import...
Stacked PRs: * __->__#1190 --- --- --- Add mxfp8 path ```Shell with-proxy CONFIG_FILE="torchtitan/models/llama3/train_configs/llama3_8b.toml " ./run_train.sh --model.print_after_conversion --training.compile --training.steps 50 --model.converters mxfloat8 --float8.recipe_name "mxfp8" ``` ## Review highlight I wish we...
See https://github.com/pytorch/pytorch/issues/147551#issuecomment-2683700299
Stacked PRs: * #2258 * __->__#2256 * #2253 --- --- --- Fixes: https://github.com/pytorch/ao/issues/2182 Add a way to do power of 2 scaling
Stacked PRs: * __->__#2219 --- --- --- Manually specify flags if no arch set
# VLLM Torch.compile Issue Tracker ## Summary This document tracks the existing issue with the way VLLM uses `torch.compile` and tensor subclasses. **TLDR**: VLLM doesn't setup `aotdispatch` correctly, causing subclass...
# MXFP Inference and Performance Tracking ## Summary This issue tracks performance and E2E integration of MXFP formats (MXFP8, MXFP4, NVFP4) on B200 and other devices. ## Status Overview |...
# Summary See: https://github.com/pypa/pip/issues/6334