Guoliang He
Guoliang He
Hello, I am trying to understand the generator as in the PET paper. To my understanding, the `threshold` parameter in the Generator::run() function defines the number of random test point...
currently the interpreter doesn't handle device_print: https://github.com/openai/triton/blob/dc45d2640f8b2e5edcf322025a6b78e70625fea7/python/triton/runtime/interpreter.py#L347 A quick fix might uncomment the line and add a handler, but I am not sure what's the desirable format of printing?
Error when assembling instruction "[B------:R3:W4:-:S06] @P3 LDG.E.EL.128 R12, desc[UR8][R16.64] ;":
Hi, Many thanks for releasing this assembler! I was trying to disassemble pytorch's kernel, but and then I saw this error: Error when assembling instruction "[B------:R3:W4:-:S06] @P3 LDG.E.EL.128 R12, desc[UR8][R16.64]...
tl.dot seems to not support when accumulator tile size < 16 (https://github.com/openai/triton/blob/854677046383bb3f0a30f3b2ba981b91fb9fb29f/python/triton/language/semantic.py#L1355C47-L1357C124) May I know what's the reason?
one useful scan Op for topk is tl.sort(); however it doesn't returned the indices as in torch.sort (https://pytorch.org/docs/stable/generated/torch.sort.html) May I know if there's a plan to essentially achieve `argsort` kinds...
I solve the long softmax puzzels, but I have to store the intermediate results to z_ptr, which may cause unnecessary Memory I/O. Essentially, I would like to know if there's...