Zheyong Fan comments

Results 69 comments of


                                            Zheyong Fan

是否有CUDA交流群？

> 樊老师，您好！我已经阅读完了您的书籍，现在在使用CUDA开发一种高性能的算法，能比NV的某官方库更快。不知道您是否有类似于读者群之类的HPC交流群？我是某985大学的硕士，应该也可以为社群做出贡献。多谢！非常不错。有一个群可以加入的: - 《CUDA Professional》QQ 群：45157483

请教一下第51页的算术强度-理论寄存器带宽-每个FMA的操作数是怎么求得的？

这方面的专业知识我知道的并不多。如果是要在学术论文中引用，最好去找期刊文献或者更加权威的文档（特别是Nvidia的技术文档）。本书定位于入门水平，比较强调实用性，不追求理论深度。

Nep4 forces

You don't need me first make dq/d r_ij available to you?

Nep4 forces

ok, then perhpas this is not ready for a PR. Or you might need to resolve conflicts later after my PR.

Nep4 forces

Ok, I see your points!

Nep4 forces

I will make the PR soon, and might also change something about memory usage . You followed my previous NEP2/NEP3 style to use a lot of `local memory` in CUDA...

Nep4 forces

To clarify: * `global memory` is also called device memory in CUDA, and is allocated in host (CPU) using `cudaMalloc()` or `cudaMallocManaged` in GPUMD. * `local memory` refers to large...

Nep4 forces

> > I will make the PR soon, and might also change something about memory usage . You followed my previous NEP2/NEP3 style to use a lot of `local memory`...

Nep4 forces

> There are two things that I'm unsure about. 1) I'm not sure that I'm, accessing dq_dr correctly in `apply_gnn_compute_messages` and 2) I'm unsure of how to include `find_force_angular` and...

Nep4 forces

You can also remove the definition and calling of `find_q_scaler` and `apply_q_scaler`. Currently you have scaled the descriptor but have not scaled the relevant derivatives. From my test of NEP3,...