Kaizhi Qian comments

Results 196 comments of


                                            Kaizhi Qian

trafficstars

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

By the way, I found after removing the "with torch.no_grad" over the inference function, it is less likely to crash and runs longer. Not sure if this information is helpful.

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

又debug了两天，gcc版本，pytorch版本，docker镜像，都换过了，其它能想到的一些方法也试了，只要调用次数一多，还是会报错。基本上发现是，对于turbo decoder的输入和输出做的操作越少，跑得越久，输入和输出操作前都要deepcopy，否则很快就报错了。但是这样也没有从根本上解决问题，最多调用一万次左右，还是会报错。我猜可能是turbo本身不稳定，反复多次调用容易出问题？

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

请问，我能否确认一下，是把49行到63行，以及74到80行，替换成您上面建议的这两行代码，然后重新编译吗？

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

报错了…… 求助~ ``` [180/270] Building CXX object turbo_transformers...eFiles/tt_core.dir/allocator/allocator_api.cpp.o FAILED: turbo_transformers/core/CMakeFiles/tt_core.dir/allocator/allocator_api.cpp.o /usr/bin/c++ -DLOGURU_WITH_STREAMS=1 -DTT_BLAS_USE_MKL -DTT_WITH_CUDA -D__CLANG_SUPPORT_DYN_ANNOTATION__ -I/usr/local/cuda/include -I/mnt/TurboTransformers/3rd/cub -I/mnt/TurboTransformers/3rd/FP16/include -I/mnt/TurboTransformers -I/opt/miniconda3/include -I/mnt/TurboTransformers/3rd/abseil -I/mnt/TurboTransformers/3rd/dlpack/include -I/mnt/TurboTransformers/3rd/loguru -Wall -m64 -fopenmp -O3 -DNDEBUG -fPIC -std=gnu++14...

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

看来稳定性似乎确实和显存分配有关系，按照您的建议修改naive allocator以后，调用了四万多次才报错。另外，观察到两个现象，第一个是，dataloader输出8个tensor，每个都有tensor.to(gpu)的操作，但其实目前只用到两个tensor，如果把剩下不用的tensor.to(gpu)操作去掉，会更稳定，跑得更久。第二个是，调用encoder的时候，外面套上with torch.no_grad()会不稳定，但是如果同时把encoder输出都加上.data.clone()，就会好很多。总得来说，目前报错的几率降低了很多，但是跑多了还是会报错，不知道是否还有其它方法可以改进多次调用的稳定性？

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

这个我之前观察过，哪怕报错的瞬间，显存也都没有超过50%，我也试过大batch入队让显存溢出，那种情况下会直接报显存不足的错，而不是这种an illegal memory access was encountered 不过，虽然显存没有溢出，但是之前显存是会有少量增长的，增长很慢，可能是出队的速度赶不上入队的速度导致的

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

好的，谢谢指点~

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

每次程序开始之前都会打印这些内容，能问一下这代表什么吗？是否有可能和之前的报错有联系呢…… ``` date time ( uptime ) [ thread name/id ] file:line v| 2020-11-13 17:15:14.559 ( 0.000s) [main thread ] loguru.cpp:610 INFO| arguments: turbo_transformers_cxx 2020-11-13 17:15:14.559 ( 0.000s) [main thread...

Using docker without conda

It's a long story. I have many other packages installed in the same image, and I have used this for development since the very beginning. If I switch to conda,...

Using docker without conda

After hours of struggling with strange errors, I finally got the following output. It is compiled using gcc/g++-6.5.0. It is running on v100 gpu. Do these numbers look reasonable to...