Jiarui Fang（方佳瑞） comments

Results 220 comments of


                                            Jiarui Fang（方佳瑞）

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

有可能是内存管理的第三方库cub不稳定。你把naive allocator里面都改成在显存里直接分配试试。 ``` return allocate_impl(size, kDLGPU); allocate_free(mem, kDLGPU); ``` https://github.com/Tencent/TurboTransformers/blob/master/turbo_transformers/core/allocator/naive_allocator.h#L48 https://github.com/Tencent/TurboTransformers/blob/master/turbo_transformers/core/allocator/naive_allocator.h#L73

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

是的

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

free_impl(mem, kDLGPU); 看一下cpu的api，改一下kDLGPU

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

你观察一下你的显存消耗是否稳定。是不是很多没释放的内存逐步积累导致程序崩溃的。

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

用cuda-memcheck检查一下内存使用情况吧

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

没什么关系，可以用一下语句关闭 turbo_transformers.set_stderr_verbose_level(0)

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

> 您好！这边我按照您的建议将naive_allocator.h文件中的49行到63行，以及74到80行分别改成return allocate_impl(size, kDLGPU); 以及free_impl(mem, kDLGPU); 在编译完后，运行时仍然在相同的地方出现an illegal memory access was encountered的问题，并没有效果，是不是有其他地方需要进行修改呢？谢谢！你也是跑了个很多step才遇到这个问题吧？

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

> 我这边只跑了几百个step就遇到这种问题了，然后根据您的建议，修改了上述代码，还是没有效果，除了return allocate_impl(size, kDLGPU); 以及free_impl(mem, kDLGPU);这两处，还有其他地方需要修改吗？谢谢！估计是multiheadedattention的CUDA实现有内存泄露，你能抽出一个简单的单测给我debug么？

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

可以随机初始化一个encoder-decoder模型，然后强制它decoder跑10000个step看看。因为我测试的decoder顶多一百多个step，可能没有发现内存泄露的问题。

Select GPU device id other than 0 using C++

Currently, no such API provided. We can schedule this feature in the future if you insist. Or you can make your own contributions by simply adding a `SetCudaDeviceId` API here....