Joshua Chia

Results 54 comments of Joshua Chia
trafficstars

@wangleis Do you also think that 24k cycles for THREADING=SEQ case is excessive so that there is room for improvement? I don't have convenient access to a machine with an...

@wenjiew I ran the experiment on a i9-13900H on Debian 12, and the results were similar, 24k cycles for SEQ and 27k cycles for TBB. I don't think the high...

I'm suggesting reducing the fixed latency of the THREADING=SEQ case if it's not too complicated. This will be helpful for cases where the computation graph has relatively few operations so...

Did you try first importing torch, having installed a version of the torch package for the same CUDA version as the CUDA kernel package you are trying to import? For...