Jury duty

Results 10 comments of Jury duty

``` CMAKE_PREFIX_PATH=/opt/rocm-6.0.0/lib/llvm/lib/cmake/llvm cmake .. -- The C compiler identification is GNU 11.4.1 -- The CXX compiler identification is GNU 11.4.1 -- Detecting C compiler ABI info -- Detecting C compiler...

On CentoS, I see following errors, now in 5.6/5.7 and 6.0: I will check on clang. ``` -- Couldn't find Lightning build in compute directory. Searching LLVM_DIR then defaulting to...

llvm/clang is from rocm. For 5.4.3, I see build works by: ``` CMAKE_PREFIX_PATH=/opt/rocm-5.4.3/llvm/lib/cmake/llvm/ cmake .. && make -j32 .... [ 97%] Building CXX object CMakeFiles/kfdtest.dir/src/RDMATest.cpp.o [100%] Linking CXX executable kfdtest...

Ok, that worked, thanks for assistance on this one. However, how would one use it to tracing to be captured on start/stop? I used "rocprof --hip-trace ./a.out" and resulting json...

OK, that appears to work, as well as roctx. Thanks.

One more question, can python use those apis? So far, i see mostly C++ codes, assuming no support for python.

MAX_JOBS=4 failed with similar error, i dont believe it is OOM.

hmm, is there a way you can fwd to someone who can? If someone here can help, where else can get help?

As of today, build starts ok but takes forever any idea?? /usr/local/cuda-12.3/bin/nvcc -I/root/extdir/gg/git/flash-attention/csrc/flash_attn -I/root/extdir/gg/git/flash-attention/csrc/flash_attn/src -I/root/extdir/gg/git/flash-attention/csrc/cutlass/include -I/miniconda3/lib/python3.11/site-packages/torch/include -I/miniconda3/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/miniconda3/lib/python3.11/site-packages/torch/include/TH -I/miniconda3/lib/python3.11/site-packages/torch/include/THC -I/usr/local/cuda-12.3/include -I/miniconda3/include/python3.11 -c csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.cu -o build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr...