Vadim Kantorov
Vadim Kantorov
For FSDP2, should Verl include more explicit controls for TP? E.g. using `parallelize_module(...)` like in torchtune https://github.com/pytorch/torchtune/blob/main/recipes/full_finetune_distributed.py and provide PyTorch's native context-parallel SDPA variant (as alternative to Ulysses) - curious...
Any plans for contributing this great fused linear cross entropy loss impl directly into PyTorch core? - https://github.com/pytorch/pytorch/issues/124480 - https://github.com/pytorch/pytorch/issues/139908 (so far it shows that Inductor under-performs compared to Liger,...
I wonder if both triton-code and triton-produced cubin / ptx can be included in core distribution for populating local artifact cache . Like so we can both preserve triton's hackability...
Any plans for a road forward for inclusion in core directly triton codes and torch.compile-d codes?
Exactly, I wonder what is needed for Triton code to be used for more ops in core. If it's cold start time for eager, then shipping pre-cached / pre-generated /...
Is it possible to somehow precompile / pregenerate from triton some version of ptx which at least would run on all relevant hardwares? Or are some new features in triton...
> Wanted to let you know that a fused linear cross entropy in core is on our roadmap, we plan to work on it in the next month or so...
Also curious if the pattern of fusing chunked Linear and some loss computation can be also implemented as a more generic/compilable higher-order op in PyTorch core. Seems the same pattern...
The idea of adding tested/fast colorspace conversions was not supported at the time in torchvision: https://github.com/pytorch/vision/issues/4029 But maybe then torchaudio could host such functions
Maybe even then some of these could be upstreamed later into torch core to be available across the board... Regarding `torio`, really hoping some torch-prefixed name can be invented before...