Vadim Kantorov

https://linkedin.com/in/vadimkantorov [email protected]

Paris, France

Results 643 comments of


                                            Vadim Kantorov

TP rollout + FSDP / TP actor

For FSDP2, should Verl include more explicit controls for TP? E.g. using `parallelize_module(...)` like in torchtune https://github.com/pytorch/torchtune/blob/main/recipes/full_finetune_distributed.py and provide PyTorch's native context-parallel SDPA variant (as alternative to Ulysses) - curious...

Add feature ligerceloss

Any plans for contributing this great fused linear cross entropy loss impl directly into PyTorch core? - https://github.com/pytorch/pytorch/issues/124480 - https://github.com/pytorch/pytorch/issues/139908 (so far it shows that Inductor under-performs compared to Liger,...

Add feature ligerceloss

I wonder if both triton-code and triton-produced cubin / ptx can be included in core distribution for populating local artifact cache . Like so we can both preserve triton's hackability...

Add feature ligerceloss

Any plans for a road forward for inclusion in core directly triton codes and torch.compile-d codes?

Add feature ligerceloss

Exactly, I wonder what is needed for Triton code to be used for more ops in core. If it's cold start time for eager, then shipping pre-cached / pre-generated /...

Add feature ligerceloss

Is it possible to somehow precompile / pregenerate from triton some version of ptx which at least would run on all relevant hardwares? Or are some new features in triton...

Add feature ligerceloss

> Wanted to let you know that a fused linear cross entropy in core is on our roadmap, we plan to work on it in the next month or so...

Add feature ligerceloss

Also curious if the pattern of fusing chunked Linear and some loss computation can be also implemented as a more generic/compilable higher-order op in PyTorch core. Seems the same pattern...

NV12/YUV->RGB colour accuracy and CUDA

The idea of adding tested/fast colorspace conversions was not supported at the time in torchvision: https://github.com/pytorch/vision/issues/4029 But maybe then torchaudio could host such functions

NV12/YUV->RGB colour accuracy and CUDA

Maybe even then some of these could be upstreamed later into torch core to be available across the board... Regarding `torio`, really hoping some torch-prefixed name can be invented before...

‹
1
2
...
46
47
48
49
50
51
52
...
64
65
›