TensorComprehensions icon indicating copy to clipboard operation
TensorComprehensions copied to clipboard

auto tune for cuda tensor not on device 0 does not working

Open seongwook-ham opened this issue 7 years ago • 0 comments

Tensor Comprehensions Github Issues Guidelines

If you have a feature request or a bug report (build issue), please open an issue on Github and fill the template below so we can help you better and faster. If you have some general questions about Tensor Comprehensions, please visit our slack channel or email us at [email protected]

For build issues, please add [Build] at the beginning of issue title.

When submitting a bug report, please include the following information (where relevant):

  • OS: ubuntu 16.04 LTS

  • How you installed TC (docker, conda, source): conda

  • Python version: 3.6

  • CUDA/cuDNN version: 8/6.0

  • Conda version (if using conda): 3

  • Docker image (if using docker):

  • GCC/GXX version (if compiling from source):

  • LLVM/Tapir git hash used (if compiling from source):

    To get the hash, run: $HOME/clang+llvm-tapir5.0/bin/clang --version

  • Commit hash of our repo and submodules (if compiling from source):

In addition, including the following information will also be very helpful for us to diagnose the problem:

  • A script to reproduce the issue (highly recommended if its a build issue)
  • Error messages and/or stack traces of the issue (create a gist)
  • Context around what you are trying to do

`mat1 = torch.ones(batch, total_length, width, height).cuda(device=1) mat2 = torch.ones(total_length).cuda(device=1)

mat3 = torch.IntTensor(temp_list) mat3.t_() mat3=mat3.cuda(device=1)

function.autotune(mat1, mat2, mat3, options=tc.Options("conv"), cache="function.tc", pop_size=100, generations=25, gpus="0,1,2,3", threads=15)`

print following error

THCudaCheck FAIL file=/opt/conda/conda‑bld/pytorch_1518243271935/work/torch/lib/THC/THCTensorCopy.cu line=85 error=77 : an illegal memory access was encountered WARNING: Logging before InitGoogleLogging() is written to STDERR W0320 22:49:40.423420 30986 rtc.cc:44] Error at: /opt/conda/conda‑bld/tensor_comprehensions_1520457708651/work/src/core/rtc.cc:44: CUDA_ERROR_ILLEGAL_ADDRESS terminate called after throwing an instance of 'std::runtime_error' what(): Error at: /opt/conda/conda‑bld/tensor_comprehensions_1520457708651/work/src/core/rtc.cc:44: CUDA_ERROR_ILLEGAL_ADDRESS

but with

`mat1 = torch.ones(batch, total_length, width, height).cuda(device=0) mat2 = torch.ones(total_length).cuda(device=0)

mat3 = torch.IntTensor(temp_list) mat3.t_() mat3=mat3.cuda(device=0)

function.autotune(mat1, mat2, mat3, options=tc.Options("conv"), cache="function.tc", pop_size=100, generations=25, gpus="0,1,2,3", threads=15)` works well

but when i want auto tune function with very large input. input should be in device 0 it cause imbalance usage of gpu memory which is not helpful(example is under here)

screenshot

if we give not cuda tensor for autotuner and autotuner automatically converting from tensor to cuda tensor and copy them to each device, it would be better

seongwook-ham avatar Mar 20 '18 14:03 seongwook-ham