Zheyong Fan

Results 69 comments of Zheyong Fan

> 樊老师,您好!我已经阅读完了您的书籍,现在在使用CUDA开发一种高性能的算法,能比NV的某官方库更快。不知道您是否有类似于读者群之类的HPC交流群?我是某985大学的硕士,应该也可以为社群做出贡献。多谢! 非常不错。有一个群可以加入的: - 《CUDA Professional》QQ 群:45157483

这方面的专业知识我知道的并不多。如果是要在学术论文中引用,最好去找期刊文献或者更加权威的文档(特别是Nvidia的技术文档)。本书定位于入门水平,比较强调实用性,不追求理论深度。

You don't need me first make dq/d r_ij available to you?

ok, then perhpas this is not ready for a PR. Or you might need to resolve conflicts later after my PR.

Ok, I see your points!

I will make the PR soon, and might also change something about memory usage . You followed my previous NEP2/NEP3 style to use a lot of `local memory` in CUDA...

To clarify: * `global memory` is also called device memory in CUDA, and is allocated in host (CPU) using `cudaMalloc()` or `cudaMallocManaged` in GPUMD. * `local memory` refers to large...

> > I will make the PR soon, and might also change something about memory usage . You followed my previous NEP2/NEP3 style to use a lot of `local memory`...

> There are two things that I'm unsure about. 1) I'm not sure that I'm, accessing dq_dr correctly in `apply_gnn_compute_messages` and 2) I'm unsure of how to include `find_force_angular` and...

You can also remove the definition and calling of `find_q_scaler` and `apply_q_scaler`. Currently you have scaled the descriptor but have not scaled the relevant derivatives. From my test of NEP3,...