oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

安装cu112的包,先import torch,后import oneflow,再使用cuda,会报错

Open shangguanshiyuan opened this issue 3 years ago • 4 comments

import torch, oneflow

x = oneflow.tensor(1).cuda()

报错信息:

F20220819 12:38:12.392885 860212 cuda_stream.cpp:103] Check failed: cublasSetMathMode(cublas_handle_, CUBLAS_TF32_TENSOR_OP_MATH) : CUBLAS_STATUS_INVALID_VALUE (7)
*** Check failure stack trace: ***
    @     0x7f27f35d2dfa  google::LogMessage::Fail()
    @     0x7f27f35d30e2  google::LogMessage::SendToLog()
    @     0x7f27f35d2967  google::LogMessage::Flush()
    @     0x7f27f35d54d9  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f27e91eeb25  oneflow::ep::CudaStream::CudaStream()
    @     0x7f27e91e9a53  oneflow::ep::CudaDevice::CreateStream()
    @     0x7f27eb2209e6  oneflow::vm::EpStreamPolicyBase::stream()
    @     0x7f27ec9f8cae  oneflow::vm::OpCallInstructionPolicy::Compute()
    @     0x7f27ec9f695f  oneflow::vm::EpStreamPolicyBase::Run()
    @     0x7f27eca01f8f  oneflow::vm::ThreadCtx::TryReceiveAndRun()
    @     0x7f27eca02d20  oneflow::(anonymous namespace)::WorkerLoop()
    @     0x7f27eca02f1d  _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvPN7oneflow2vm9ThreadCtxERKSt8functionIFvS6_EEES6_ZNS3_14VirtualMachine15CreateThreadCtxENS3_6SymbolINS3_6DeviceEEENS3_10StreamTypeEEUlS6_E2_EEEEE6_M_runEv
    @     0x7f298da0bde4  (unknown)
    @     0x7f2994cda609  start_thread
    @     0x7f2994e14133  clone
Aborted (core dumped)

cu112的包会报错,cu102的包不会报错,stable和nightly都可以复现 先import oneflow后import torch不会报错 python3.8.10 CUDA driver 515.65.01 oneflow-16

shangguanshiyuan avatar Aug 19 '22 13:08 shangguanshiyuan

怎么会发现这个问题

yuanms2 avatar Aug 19 '22 13:08 yuanms2

这个很可能是PyTorch链接的cublas.so和OneFlow不是一个版本,OneFlow的版本更高 这里可以考虑加一个检查,强制运行时的cublas和cudnn版本不能低于编译时

liujuncheng avatar Aug 19 '22 13:08 liujuncheng

怎么会发现这个问题

在做一个测试的时候想做个对比,就同时import了torch和oneflow,遇到了这个问题

shangguanshiyuan avatar Aug 19 '22 14:08 shangguanshiyuan

这里可以考虑加一个检查,强制运行时的cublas和cudnn版本不能低于编译时

好的,我明天研究研究怎么加。是不是 cublasGetVersion() >= CUBLAS_VERSION

shangguanshiyuan avatar Aug 19 '22 15:08 shangguanshiyuan

checked by https://github.com/Oneflow-Inc/oneflow/pull/9257

shangguanshiyuan avatar Oct 31 '22 05:10 shangguanshiyuan