oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

关于oneflow一份代码内使用连续使用两个不同模型出现的问题

Open yaoxinthu opened this issue 3 years ago • 4 comments

What is your OneFlow installation (pip, source, dockerhub): pip OS:Linux version 5.8.0-63-generic (buildd@lgw01-amd64-035) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) OneFlow version (run python3 -m oneflow --doctor): version: '0.8.1.dev20221115+cu112' git_commit: https://github.com/Oneflow-Inc/oneflow/commit/5610333a971c8da36f44690e9672be84c0094bdf cmake_build_type: Release rdma: True mlir: True Python version:3.7.13 CUDA driver version:11.6 GPU models:NVIDIA A30 Other info: 我使用的模型是这样的。 首先通过OneFlowStableDiffusionPipeline的sd模型生成一个图像。然后通过oneflow化的Real-ESRGAN-colab做高清。 结果单独使用OneFlowStableDiffusionPipeline或者Real-ESRGAN都没有问题。 然而两个模型同时load到显卡中的时候,使用Real-ESRGAN模型就会报如下错误

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_1752/2202719188.py in <module>
      3 image = Image.open(path_to_image).convert('RGB')
      4 
----> 5 sr_image = realesr_model.predict(image)
      6 
      7 sr_image

~/diffusion_server/oneflowrealesr/oneflowrealesrgan.py in predict(self, lr_image, batch_size, patches_size, padding, pad_size)
     39         with torch.autocast("cuda"):
     40             with torch.no_grad():
---> 41                 res = self.model(img[0:batch_size])
     42                 for i in range(batch_size, img.shape[0], batch_size):
     43                     res = torch.cat((res, self.model(img[i:i+batch_size])), 0)

~/.local/lib/python3.7/site-packages/oneflow/nn/module.py in __call__(self, *args, **kwargs)
    156                 args = result
    157 
--> 158         res = self.forward(*args, **kwargs)
    159 
    160         for hook in itertools.chain(self._forward_hooks.values()):

~/diffusion_server/oneflowrealesr/oneflowrrdbnet_arch.py in forward(self, x)
    111             feat = x
    112         feat = self.conv_first(feat)
--> 113         body_feat = self.conv_body(self.body(feat))
    114         feat = feat + body_feat
    115         # upsample

~/.local/lib/python3.7/site-packages/oneflow/nn/module.py in __call__(self, *args, **kwargs)
    156                 args = result
    157 
--> 158         res = self.forward(*args, **kwargs)
    159 
    160         for hook in itertools.chain(self._forward_hooks.values()):

~/.local/lib/python3.7/site-packages/oneflow/nn/utils/container.py in forward(self, input)
     97         def forward(self, input):
     98             for module in self:
---> 99                 input = module(input)
    100             return input
    101 

~/.local/lib/python3.7/site-packages/oneflow/nn/module.py in __call__(self, *args, **kwargs)
    156                 args = result
    157 
--> 158         res = self.forward(*args, **kwargs)
    159 
    160         for hook in itertools.chain(self._forward_hooks.values()):

~/diffusion_server/oneflowrealesr/oneflowrrdbnet_arch.py in forward(self, x)
     56 
     57     def forward(self, x):
---> 58         out = self.rdb1(x)
     59         out = self.rdb2(out)
     60         out = self.rdb3(out)

~/.local/lib/python3.7/site-packages/oneflow/nn/module.py in __call__(self, *args, **kwargs)
    156                 args = result
    157 
--> 158         res = self.forward(*args, **kwargs)
    159 
    160         for hook in itertools.chain(self._forward_hooks.values()):

~/diffusion_server/oneflowrealesr/oneflowrrdbnet_arch.py in forward(self, x)
     30 
     31     def forward(self, x):
---> 32         x1 = self.lrelu(self.conv1(x))
     33         x2 = self.lrelu(self.conv2(torch.cat((x, x1), 1)))
     34         x3 = self.lrelu(self.conv3(torch.cat((x, x1, x2), 1)))

~/.local/lib/python3.7/site-packages/oneflow/nn/module.py in __call__(self, *args, **kwargs)
    156                 args = result
    157 
--> 158         res = self.forward(*args, **kwargs)
    159 
    160         for hook in itertools.chain(self._forward_hooks.values()):

~/.local/lib/python3.7/site-packages/oneflow/nn/modules/activation.py in forward(self, x)
    860 
    861     def forward(self, x):
--> 862         return flow._C.leaky_relu(x, alpha=self.negative_slope, inplace=self.inplace)
    863 
    864     def extra_repr(self):

RuntimeError: Check failed: tensor_impl->tensor_meta()->dtype() == output_tensor_metas.at(i)->dtype() 
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/functional/impl/activation_functor.cpp", line 496, in operator()
    OpInterpUtil::Dispatch(*op_, {x}, outputs.get(), attrs)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter.cpp", line 96, in Apply
    internal_->Apply(op_expr, inputs, outputs, ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_local_op_interpreter.cpp", line 123, in NaiveInterpret
    
Error Type: oneflow.ErrorProto.check_failed_error

然后试了一下sd模型load进显存,但是不跑。本地Image.open一张图让Real-ESRGAN模型跑。仍然还是报这个错误。 示例代码如下:

import oneflow as torch
from diffusers import OneFlowStableDiffusionPipeline
pipe = OneFlowStableDiffusionPipeline.from_pretrained(path)
pipe = pipe.to("cuda:1")
prompt = "1 girl"
with torch.autocast("cuda"):
    image=pipe(prompt_scale=7.5)["sample"][0]



from oneflowrealesrgan import RealESRGAN
realesr_model = RealESRGAN("cuda:1", scale=2)
realesr_model.load_weights('xxxx/RealESRGAN/x2')
from PIL import Image
"""
path_to_image = '../1.png'
image = Image.open(path_to_image).convert('RGB')
"""
sr_image = realesr_model.predict(image)

无论是自己生成的image还是直接Image.open的image都会报错 但是两段代码分开单独跑都是ok的。

yaoxinthu avatar Nov 16 '22 09:11 yaoxinthu

你好,您对Real-ESRGAN-colab做的修改只是import oneflow as torch,没有其他的修改对吗?我们复现一下您的问题。 或者您也可以直接将您运行的代码push到您的账户下。

shangguanshiyuan avatar Nov 16 '22 10:11 shangguanshiyuan

我使用Real-ESRGAN-colab的相关代码如下,我自己转换成oneflow的模型在weights/RealESRGANx2.zip的压缩包里。 直接解压缩,然后修改run.py里的模型路径就可以跑。这个模型是2倍放大的。或者你们可以下载Real-ESRGAN-colab里的原版模型。使用我代码里的torch2oneflow.py来自行转换。 如上面所说,这份代码单独跑是没问题的,但是如果同时在代码里载入OneFlowStableDiffusionPipeline的模型就不行了。 代码如下: https://github.com/yaoxinthu/oneflowrealesr 十分感谢

yaoxinthu avatar Nov 16 '22 12:11 yaoxinthu

好的,我来复现一下您遇到的问题,谢谢您的反馈。

shangguanshiyuan avatar Nov 16 '22 12:11 shangguanshiyuan

为了方便您复现,我这边提供了一个简单的复现代码。 https://github.com/yaoxinthu/oneflowrealesr/blob/master/error_run.py 只要您的系统上有sdv1.4并且确认一下代码第14行的RealESRGAN_path是否正确。就可以复现我遇到的错误

yaoxinthu avatar Nov 16 '22 13:11 yaoxinthu

你好,感谢您的反馈,这个问题的原因是OneFlowStableDiffusionPipeline中的一个优化对您后面使用的那个模型造成了影响,您在您的代码from diffusers import OneFlowStableDiffusionPipeline之后通过os.environ["ONEFLOW_KERNEL_ENABLE_CUDNN_FUSED_CONV_BIAS"] = "0"关闭该优化即可解决您遇到的问题。 我们会在后续的开发中对该优化以更妥善的方式进行处理。感谢您的反馈。

shangguanshiyuan avatar Nov 17 '22 05:11 shangguanshiyuan

解决了,十分感谢。期待后续版本的优化!

yaoxinthu avatar Nov 17 '22 06:11 yaoxinthu