onediff icon indicating copy to clipboard operation
onediff copied to clipboard

关于添加环境 ONEFLOW_CONV_ALLOW_HALF_PRECISION_ACCUMULATION 报错

Open lovejing0306 opened this issue 10 months ago • 12 comments

Describe the bug

使用 下面的环境变量后报错 os.environ['ONEFLOW_CONV_ALLOW_HALF_PRECISION_ACCUMULATION'] = '0' os.environ['ONEFLOW_MATMUL_ALLOW_HALF_PRECISION_ACCUMULATION'] = '0'

错误信息如下:

F20240326 18:44:23.906044  1387 fused_matmul_bias_kernel.cu:84] Check failed: cublasLtMatmul( cuda_stream->cublas_lt_handle(), matmul_cache->operation_desc, &sp_alpha, weight->dptr(), matmul_cache->cublas_a_desc, x->dptr(), matmul_cache->cublas_b_desc, &sp_beta, (_add_to_output == nullptr) ? y_ptr : _add_to_output->dptr(), matmul_cache->cublas_c_desc, y_ptr, matmul_cache->cublas_c_desc, &matmul_cache->cublas_algo, cuda_stream->cublas_workspace(), cuda_stream->cublas_workspace_size(), cuda_stream->cuda_stream()) : CUBLAS_STATUS_NOT_SUPPORTED (15)
*** Check failure stack trace: ***
    @     0x7fe5b15ef96a  google::LogMessage::Fail()
    @     0x7fe5b15f28a1  google::LogMessage::SendToLog()
    @     0x7fe5b15ef499  google::LogMessage::Flush()
    @     0x7fe5b15f3189  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fe5aacfb2fb  oneflow::(anonymous namespace)::FusedMatmulBiasKernel::Compute()
    @     0x7fe5acd9ec45  oneflow::one::StatefulOpKernel::Compute()
    @     0x7fe5a97d56ea  _ZZN7oneflow2vm21OpCallInstructionUtil7ComputeEPNS0_23OpCallInstructionPolicyEPNS0_6StreamEbbENKUlvE_clEv
    @     0x7fe5a97d7018  oneflow::vm::OpCallInstructionUtil::Compute()
    @     0x7fe5a97d3c0d  _ZZN7oneflow2vm23OpCallInstructionPolicy7ComputeEPNS0_11InstructionEENKUlPKcE_clES5_.constprop.0
    @     0x7fe5a97d4469  oneflow::vm::OpCallInstructionPolicy::Compute()
    @     0x7fe5a97cd0c8  oneflow::vm::Instruction::Compute()
    @     0x7fe5a97c9de5  oneflow::vm::EpStreamPolicyBase::Run()
    @     0x7fe5a9827529  oneflow::vm::ThreadCtx::TryReceiveAndRun()
    @     0x7fe5a982bcad  oneflow::(anonymous namespace)::WorkerLoop()
    @     0x7fe5a982c438  _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvPN7oneflow2vm9ThreadCtxERKSt8functionIFvS6_EEES6_ZNS3_14VirtualMachine15CreateThreadCtxENS3_6SymbolINS3_6DeviceEEENS3_10StreamTypeEmEUlS6_E3_EEEEE6_M_runEv
    @     0x7fe5b1604f20  execute_native_thread_routine
    @     0x7fe6e198d609  start_thread
    @     0x7fe6e1758133  clone
Stack trace (most recent call last) in thread 1387:
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5b1604f1f, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5a982c437, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5a982bcac, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5a9827528, in vm::ThreadCtx::TryReceiveAndRun()
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5a97c9de4, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5a97cd0c7, in vm::Instruction::Compute()
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5a97d4468, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5a97d3c0c, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5a97d7017, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5a97d56e9, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)::{lambda()#1}::operator()() const
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5acd9ec44, in StatefulOpKernel::Compute(eager::CallContext*, ep::Stream*, user_op::OpKernel const*, user_op::OpKernelState*, user_op::OpKernelCache const*) const
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5aacfb2fa, in (anonymous namespace)::FusedMatmulBiasKernel::Compute(user_op::KernelComputeContext*, user_op::OpKernelState*, user_op::OpKernelCache const*) const
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5b15f3188, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5b15ef498, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5b15f28a0, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5b15ef969, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-3e0702bd.so", at 0x7fe5a1821e78, in

Aborted (Signal sent by tkill() 1119 0)
Aborted (core dumped)

lovejing0306 avatar Mar 26 '24 10:03 lovejing0306

@chengzeyi @hjchen2 Let's take a look

strint avatar Mar 27 '24 02:03 strint

@chengzeyi @hjchen2 Let's take a look

这个问题好解决吗?

lovejing0306 avatar Mar 27 '24 10:03 lovejing0306

这个问题在解决中

strint avatar Apr 01 '24 09:04 strint

最新版本已经修复该问题,请安装最新的oneflow,

python3 -m pip install -U --pre oneflow -f https://oneflow-pro.oss-cn-beijing.aliyuncs.com/branch/community/cu121

hjchen2 avatar Apr 03 '24 13:04 hjchen2

最新版本已经修复该问题,请安装最新的oneflow,

python3 -m pip install -U --pre oneflow -f https://oneflow-pro.oss-cn-beijing.aliyuncs.com/branch/community/cu121

我用了您说的最新版本,但出现了新的错误。

Stack trace (most recent call last) in thread 1098:
W20240408 19:33:49.514078   994 cudnn_conv_util.cpp:105] Currently available alogrithm (algo=0, require memory=0, idx=1) meeting requirments (max_workspace_size=2147483648, determinism=0) is not fastest. Fastest algorithm (1) requires memory 2149842960
W20240408 19:33:49.514624   994 cudnn_conv_util.cpp:105] Currently available alogrithm (algo=0, require memory=0, idx=1) meeting requirments (max_workspace_size=2147483648, determinism=0) is not fastest. Fastest algorithm (1) requires memory 2148663312
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f418d635f1f, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f4185850be7, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f418585045c, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f418584bcd8, in vm::ThreadCtx::TryReceiveAndRun()
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f41857ee474, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f41857f1777, in vm::Instruction::Compute()
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f4185878acf, in vm::FuseInstructionPolicy::Compute(vm::Instruction*)
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f41857f1777, in vm::Instruction::Compute()
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f41857f8b58, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f41857f8829, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f41857f397a, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f417cfe3d3c, in

我这边的环境如下:

onediff                   0.13.0.dev202404080125
onediffx                  0.13.0.dev0             /var/onediff/onediff_diffusers_extensions
oneflow                   0.9.1.dev20240406+cu121
onefx                     0.0.3
torch                     2.2.2

显卡类型 A10 24G

lovejing0306 avatar Apr 08 '24 11:04 lovejing0306

cuda_stream

还有下面的这个错误

F20240408 20:02:20.552623  1965 fused_matmul_bias_kernel.cu:84] Check failed: cublasLtMatmul( cuda_stream->cublas_lt_handle(), matmul_cache->operation_desc, &sp_alpha, weight->dptr(), matmul_cache->cublas_a_desc, x->dptr(), matmul_cache->cublas_b_desc, &sp_beta, (_add_to_output == nullptr) ? y_ptr : _add_to_output->dptr(), matmul_cache->cublas_c_desc, y_ptr, matmul_cache->cublas_c_desc, &matmul_cache->cublas_algo, cuda_stream->cublas_workspace(), cuda_stream->cublas_workspace_size(), cuda_stream->cuda_stream()) : CUBLAS_STATUS_NOT_SUPPORTED (15)
*** Check failure stack trace: ***
    @     0x7f225502096a  google::LogMessage::Fail()
    @     0x7f22550238a1  google::LogMessage::SendToLog()
    @     0x7f2255020499  google::LogMessage::Flush()
    @     0x7f2255024189  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f224e726d5b  oneflow::(anonymous namespace)::FusedMatmulBiasKernel::Compute()
    @     0x7f22507cbdf5  oneflow::one::StatefulOpKernel::Compute()
    @     0x7f224d1f9dda  _ZZN7oneflow2vm21OpCallInstructionUtil7ComputeEPNS0_23OpCallInstructionPolicyEPNS0_6StreamEbbENKUlvE_clEv
    @     0x7f224d1fb708  oneflow::vm::OpCallInstructionUtil::Compute()
    @     0x7f224d1f82fd  _ZZN7oneflow2vm23OpCallInstructionPolicy7ComputeEPNS0_11InstructionEENKUlPKcE_clES5_.constprop.0
    @     0x7f224d1f8b59  oneflow::vm::OpCallInstructionPolicy::Compute()
    @     0x7f224d1f1778  oneflow::vm::Instruction::Compute()
    @     0x7f224d1ee475  oneflow::vm::EpStreamPolicyBase::Run()
    @     0x7f224d24bcd9  oneflow::vm::ThreadCtx::TryReceiveAndRun()
    @     0x7f224d25045d  oneflow::(anonymous namespace)::WorkerLoop()
    @     0x7f224d250be8  _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvPN7oneflow2vm9ThreadCtxERKSt8functionIFvS6_EEES6_ZNS3_14VirtualMachine15CreateThreadCtxENS3_6SymbolINS3_6DeviceEEENS3_10StreamTypeEmEUlS6_E3_EEEEE6_M_runEv
    @     0x7f2255035f20  execute_native_thread_routine
    @     0x7f2397ab6609  start_thread
    @     0x7f2397881353  clone
Stack trace (most recent call last) in thread 1965:
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f2255035f1f, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f224d250be7, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f224d25045c, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f224d24bcd8, in vm::ThreadCtx::TryReceiveAndRun()
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f224d1ee474, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f224d1f1777, in vm::Instruction::Compute()
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f224d1f8b58, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f224d1f82fc, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f224d1fb707, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f224d1f9dd9, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)::{lambda()#1}::operator()() const
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f22507cbdf4, in StatefulOpKernel::Compute(eager::CallContext*, ep::Stream*, user_op::OpKernel const*, user_op::OpKernelState*, user_op::OpKernelCache const*) const
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f224e726d5a, in (anonymous namespace)::FusedMatmulBiasKernel::Compute(user_op::KernelComputeContext*, user_op::OpKernelState*, user_op::OpKernelCache const*) const
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f2255024188, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f2255020498, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f22550238a0, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f2255020969, in
   Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-bfe31c8d.so", at 0x7f22452393f0, in

Aborted (Signal sent by tkill() 1795 0)
Aborted (core dumped)

lovejing0306 avatar Apr 08 '24 12:04 lovejing0306

你这个问题是不是显存不足引起的,你跑的时候可以监控一下显存占用。另外请问一下你跑的是什么模型,是svd吗

hjchen2 avatar Apr 08 '24 12:04 hjchen2

你这个问题是不是显存不足引起的,你跑的时候可以监控一下显存占用。另外请问一下你跑的是什么模型,是svd吗

两个问题都是显存不足吗?

不是 svd 就是 sdxl。我观察到,使用 onediff 进行加速的时候,使用的显存占用量会多很多,这个有什么解决方案吗?我的 A10 只有 24g 显存

lovejing0306 avatar Apr 09 '24 01:04 lovejing0306

你这个问题是不是显存不足引起的,你跑的时候可以监控一下显存占用。另外请问一下你跑的是什么模型,是svd吗

两个问题都是显存不足吗?

不是 svd 就是 sdxl。我观察到,使用 onediff 进行加速的时候,使用的显存占用量会多很多,这个有什么解决方案吗?我的 A10 只有 24g 显存

首先你可以先尝试把vae的编译加速关掉,另外请问一下你使用的分辨率是多大的,如果不用onediff加速的时候正常的显存占用是多少?

hjchen2 avatar Apr 09 '24 02:04 hjchen2

你这个问题是不是显存不足引起的,你跑的时候可以监控一下显存占用。另外请问一下你跑的是什么模型,是svd吗

两个问题都是显存不足吗? 不是 svd 就是 sdxl。我观察到,使用 onediff 进行加速的时候,使用的显存占用量会多很多,这个有什么解决方案吗?我的 A10 只有 24g 显存

首先你可以先尝试把vae的编译加速关掉,另外请问一下你使用的分辨率是多大的,如果不用onediff加速的时候正常的显存占用是多少?

把 vae 关掉,不设置 ONEFLOW_CONV_ALLOW_HALF_PRECISION_ACCUMULATION 和 ONEFLOW_MATMUL_ALLOW_HALF_PRECISION_ACCUMULATION 是可以推理的。

使用的分辨率是 1024x1024

但之前设置 vae 时同样的机器是可以正常推理的,设置 ONEFLOW_CONV_ALLOW_HALF_PRECISION_ACCUMULATION 和 ONEFLOW_MATMUL_ALLOW_HALF_PRECISION_ACCUMULATION 之后也会回报错。

你们那边有官方的 docker 镜像可以使用吗?

lovejing0306 avatar Apr 10 '24 04:04 lovejing0306

你们那边有官方的 docker 镜像可以使用吗?

现在没有提供 docker

strint avatar Apr 15 '24 15:04 strint

把 vae 关掉,不设置 ONEFLOW_CONV_ALLOW_HALF_PRECISION_ACCUMULATION 和 ONEFLOW_MATMUL_ALLOW_HALF_PRECISION_ACCUMULATION 是可以推理的。

看起来是 vae 编译导致的问题。你可以先关掉 VAE 的编译。

VAE 这里显存开销增多比较明显,如果你的显存比较少就不适合打开。在 1.2 时,我们会想办法解决 VAE 的显存问题。

strint avatar Apr 15 '24 15:04 strint