tangjinchuan

Results 39 comments of tangjinchuan

[Apple M2 MAX.txt](https://github.com/artyom-beilis/pytorch_dlprim/files/13939622/Apple.M2.MAX.txt) HDF5_cpp is not built by default according to its website, hence no such lib from homebrew. Thus, disabled HDF5 lib in the makefile to get there.

> > I am also supervising an undergraduate student to do a comparative study on the performance of different GPU random number generators. May I ask if there is any...

Dear Artyom, https://github.com/artyom-beilis/dlprimitives/blob/46b9d17b76c40d05b323a1a0ea484d61ac5f17b2/src/core/pointwise.cpp#L77C24-L77C25 would this be quite expensive to handling \n to \\\\\\n using for loop, or is this necessary? I did not find similar lines to specifically handle this...

I appreciate the smart way you organised the global kernel parameters with different lengths. For me, I would write independent kernels for the above operations as well as prescan and...

> The independent kernels are mostly for some computationally intensive/non-standard tasks like pooling, convolutions etc. There is another way which uses a fixed-length kernel. All the broadcasting size info regarding...

[Intel Arc 770.txt](https://github.com/artyom-beilis/pytorch_dlprim/files/14196526/Intel.Arc.770.txt) as promised. This one really took time for it can freeze while executing some cases.

[clinfo770.txt](https://github.com/artyom-beilis/pytorch_dlprim/files/14228960/clinfo770.txt) You are welcome. Platform Name: Intel(R) OpenCL Graphics

In the meantime, there is a website https://compubench.com/device.jsp?benchmark=compu20d&os=Windows&api=cl&D=Intel%28R%29+Arc%28TM%29+A770+Graphics&testgroup=info which could provide more results for new devices.

[7800xt.txt](https://github.com/user-attachments/files/15748334/7800xt.txt) Using the latest AMD RoCM 6.1.2 on Ubuntu 24.04 with the repository ed32af0ea236216468653d92ddcf6f219eebc5dd with AMD shuffle. There is no problem reported. [tuningresults.zip](https://github.com/user-attachments/files/15748579/tuningresults.zip) In terms of tuning performance, if I...

@fancyIX Thank you very much for this message. It's fine. I believe the community highly appreciates any commitment you have been making.