ravil-mobile
ravil-mobile
Hi @francescomosconi, I was running your scenario on a local CPU server. 850k elements was too much for it. Do you have an opportunity to half the mesh by coarsening?...
Hi, @Thomas-Ulrich and @sebwolf-de . This issue is also related to the SeisSol-CPU version. Do you have any idea?
> We should not switch inside `triton.language.extra.cuda` for hip. That's not the proper layering. We can create `triton.language.extra.hip` for the hip libdevice cases. And use a switch at `triton.language.extra.libdevice` that...
@zhanglx13, @antiagainst can you have a look again at the changes?
> btw, you can use > > https://github.com/openai/triton/blob/23d5c2115f06e6920580c84603c9f08a084186af/.github/workflows/integration-tests.yml#L144 > > > to help with the format Thanks!
Hi @zhanglx13 and @antiagainst , could you, please, have a look at the PR one more time and give me your comments. There are some Ops for which I couldn't...
Hi @zhanglx13 ### Question 1 > Why you need to hack ExternElementwiseOp for llvm intrinsics? Why do llvm intrinsics work differently than ocml_xxx functions? ### 1.1 I replaced `__nv_floorf` and...
Hi @zhanglx13 @antiagainst, the last commit adds support for the AOT. Based on my tests, the current solution works for both AOT and JIT
> Also please fix the format error reported by bots. done
> How did you test AOT? Hi @zhanglx13, I test AOT based on the following test https://github.com/triton-lang/triton/blob/25b4212a9b7afd9bade743ba803264b28b84e599/python/test/unit/tools/test_aot.py#L436-L440 Here is an example: ```python @triton.jit def kernel(x_ptr, y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr):...