pytorch icon indicating copy to clipboard operation
pytorch copied to clipboard

FP8 rowwise scaling

Open drisspg opened this issue 1 year ago • 2 comments

Summary

Still need to figure out this symbol

Current work around is to set: LD_PRELOAD=/usr/lib64/libcuda.so, the lazyNVRTC approach should be the correct approach but still getting Not sure why the base symbol is still being added....

❯ nm /home/drisspg/meta/pytorch/torch/lib/libtorch_cuda.so | grep cuT
                 U cuTensorMapEncodeTiled
0000000000f6d670 t _ZN2at4cuda6detail6_stubs22cuTensorMapEncodeTiledEP14CUtensorMap_st24CUtensorMapDataType_enumjPvPKmS8_PKjSA_26CUtensorMapInterleave_enum23CUtensorMapSwizzle_enum27CUtensorMapL2promotion_enum28CUtensorMapFloatOOBfill_enum
0000000000d25e47 t _ZN2at4cuda6detail6_stubs22cuTensorMapEncodeTiledEP14CUtensorMap_st24CUtensorMapDataType_enumjPvPKmS8_PKjSA_26CUtensorMapInterleave_enum23CUtensorMapSwizzle_enum27CUtensorMapL2promotion_enum28CUtensorMapFloatOOBfill_enum.cold

drisspg avatar Apr 30 '24 00:04 drisspg

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/125204

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: You can merge normally! (31 Unrelated Failures)

As of commit 44483972bdd3dcd0c047020694817210846b5d70 with merge base 44483972bdd3dcd0c047020694817210846b5d70 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Apr 30 '24 00:04 pytorch-bot[bot]

❯ nm  -C /home/drisspg/meta/pytorch/torch/lib/libtorch_cuda.so | grep cuT
0000000002561b90 T cuTensorMapEncodeTiled
0000000000ef1110 t at::cuda::detail::_stubs::cuTensorMapEncodeTiled(CUtensorMap_st*, CUtensorMapDataType_enum, unsigned int, void*, unsigned long const*, unsigned long const*, unsigned int const*, unsigned int const*, CUtensorMapInterleave_enum, CUtensorMapSwizzle_enum, CUtensorMapL2promotion_enum, CUtensorMapFloatOOBfill_enum)
0000000000cf4fa7 t at::cuda::detail::_stubs::cuTensorMapEncodeTiled(CUtensorMap_st*, CUtensorMapDataType_enum, unsigned int, void*, unsigned long const*, unsigned long const*, unsigned int const*, unsigned int const*, CUtensorMapInterleave_enum, CUtensorMapSwizzle_enum, CUtensorMapL2promotion_enum, CUtensorMapFloatOOBfill_enum) [clone .cold]

This symbol shadowing doesnt seem right

drisspg avatar May 23 '24 22:05 drisspg

After some preproc shenanigans I think I got it in a state that seems better but would love some feedback from packaging experts:

❯ nm -C /home/drisspg/meta/pytorch/torch/lib/libtorch_cuda.so | grep cuT;
0000000002561680 t nvrtc_cuTensorMapEncodeTiled(CUtensorMap_st*, CUtensorMapDataType_enum, unsigned int, void*, unsigned long const*, unsigned long const*, unsigned int const*, unsigned int const*, CUtensorMapInterleave_enum, CUtensorMapSwizzle_enum, CUtensorMapL2promotion_enum, CUtensorMapFloatOOBfill_enum) [clone .constprop.1]
0000000000ef10c0 t at::cuda::detail::_stubs::cuTensorMapEncodeTiled(CUtensorMap_st*, CUtensorMapDataType_enum, unsigned int, void*, unsigned long const*, unsigned long const*, unsigned int const*, unsigned int const*, CUtensorMapInterleave_enum, CUtensorMapSwizzle_enum, CUtensorMapL2promotion_enum, CUtensorMapFloatOOBfill_enum)
0000000000cf4f57 t at::cuda::detail::_stubs::cuTensorMapEncodeTiled(CUtensorMap_st*, CUtensorMapDataType_enum, unsigned int, void*, unsigned long const*, unsigned long const*, unsigned int const*, unsigned int const*, CUtensorMapInterleave_enum, CUtensorMapSwizzle_enum, CUtensorMapL2promotion_enum, CUtensorMapFloatOOBfill_enum) [clone .cold]

drisspg avatar May 24 '24 04:05 drisspg

this is great! API looks good, I'll defer to others for the cutlass part.

vkuzo avatar May 28 '24 18:05 vkuzo

@pytorchbot merge

drisspg avatar May 31 '24 20:05 drisspg

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging Check the merge workflow status here

pytorchmergebot avatar May 31 '24 20:05 pytorchmergebot

@pytorchmergebot revert -c nosignal -m "Broke nightlies and internal tests"

atalman avatar Jun 03 '24 14:06 atalman

Observing failure on Windows builds: https://github.com/pytorch/pytorch/actions/runs/9346105085/job/25720265925

C:/cb/pytorch_1000000000000/work/aten/src/ATen/../../../third_party/cutlass/include\cutlass/uint128.h(189): error: calling a __host__ function("_udiv128") from a __host__ __device__ function("cutlass::uint128_t::operator / const") is not allowed

1 error detected in the compilation of "C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/RowwiseScaledMM.cu".
C:/cb/pytorch_1000000000000/work/aten/src/ATen/../../../third_party/cutlass/include\cutlass/uint128.h(189): error: calling a __host__ function("_udiv128") from a __host__ __device__ function("cutlass::uint128_t::operator / const") is not allowed

1 error detected in the compilation of "C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/RowwiseScaledMM.cu".

atalman avatar Jun 03 '24 15:06 atalman

@pytorchbot successfully started a revert job. Check the current status here. Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot avatar Jun 03 '24 15:06 pytorchmergebot

@drisspg your PR has been successfully reverted.

pytorchmergebot avatar Jun 03 '24 15:06 pytorchmergebot

@Skylion007 I wonder if this was also updated in the newer version of Cutlass

C:/cb/pytorch_1000000000000/work/aten/src/ATen/../../../third_party/cutlass/include\cutlass/uint128.h(189): error: calling a __host__ function("_udiv128") from a __host__ __device__ function("cutlass::uint128_t::operator / const") is not allowed

drisspg avatar Jun 03 '24 15:06 drisspg

@pytorchbot merge

drisspg avatar Jun 05 '24 14:06 drisspg

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging Check the merge workflow status here

pytorchmergebot avatar Jun 05 '24 14:06 pytorchmergebot

@pytorchbot -i

drisspg avatar Jun 05 '24 14:06 drisspg

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: -i

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

pytorch-bot[bot] avatar Jun 05 '24 14:06 pytorch-bot[bot]

@pytorchbot merge -i

drisspg avatar Jun 05 '24 14:06 drisspg

@pytorchbot merge -f "I don think these failures are related"

drisspg avatar Jun 05 '24 15:06 drisspg

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command For more information see pytorch-bot wiki.

pytorchmergebot avatar Jun 05 '24 15:06 pytorchmergebot

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging Check the merge workflow status here

pytorchmergebot avatar Jun 05 '24 15:06 pytorchmergebot

@pytorchmergebot revert -c ghfirst -m "Sorry need to revert this failing, on internal CI. I suggest to reimport this and try to land internally resolving all issues"

atalman avatar Jun 06 '24 16:06 atalman

@pytorchbot successfully started a revert job. Check the current status here. Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot avatar Jun 06 '24 16:06 pytorchmergebot

@drisspg your PR has been successfully reverted.

pytorchmergebot avatar Jun 06 '24 16:06 pytorchmergebot

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Jun 06 '24 20:06 facebook-github-bot

@drisspg how should we resolve this for now on the extension side? <ATen/cuda/nvrtc_stub/ATenNVRTC.h> cannot be used by C++ extensions.

cora-codes avatar Jul 11 '24 00:07 cora-codes