Da Li (李达)
Da Li (李达)
Here is a mini-benchmark (I guess you forgot this?) https://github.com/numba/numba/pull/9520#issuecomment-2033243981
After 2nd thought, perhaps this way is too tricky to maintain? I won't mind if you don't wanna merge this into next release.
Hi, @guilhermeleobas. Sure, I should provide an example to show it, I will give one after I provide a corresponding PR, then we can compare the performance changes with and...
Any update on this feature? cc @gmarkall @spenczar . sounds very useful if possible to use it.
Hi, I tried to install stuff and run the demo in https://github.com/spenczar/numba_stap_demo. But I met an import issue when `import bcc from USDT` in `find_stap_lib.py`. I searched pypi for bcc-related...
I would like to give this a try. After first looking, I found some useful comments on `cuda.jit` decorator: ```python :param debug: If True, check for exceptions thrown when executing...
I also tested this case locally. If just using `@cuda.jit`, this test will pass. So the issue comes with `lineinfo=True` option. And it's also related to this complex branch structure,...
BTW, when I want to see optimized NVVM IR, which envvar should be useful? I tried with: ```python # os.environ["NUMBA_CUDA_DEBUGINFO"] = "1" # os.environ["NUMBA_DEBUG_TYPEINFER"] = "1" os.environ["NUMBA_DUMP_LLVM"] = "1" #...
So I took a quick look into https://github.com/numba/numba/blob/df07de114404225e64eea3c0622d3aee4a12e0c8/numba/cuda/codegen.py#L138-L150 I think `llvm_strs` should be the unoptimized LLVM IR from numba frontend? Then cuda codegen directly converts it to PTX, so in...
drop this for now, since the main PR in numba-cuda is stalled.