oneflow
oneflow copied to clipboard
关于oneflow一份代码内使用连续使用两个不同模型出现的问题
What is your OneFlow installation (pip, source, dockerhub): pip OS:Linux version 5.8.0-63-generic (buildd@lgw01-amd64-035) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) OneFlow version (run python3 -m oneflow --doctor): version: '0.8.1.dev20221115+cu112' git_commit: https://github.com/Oneflow-Inc/oneflow/commit/5610333a971c8da36f44690e9672be84c0094bdf cmake_build_type: Release rdma: True mlir: True Python version:3.7.13 CUDA driver version:11.6 GPU models:NVIDIA A30 Other info: 我使用的模型是这样的。 首先通过OneFlowStableDiffusionPipeline的sd模型生成一个图像。然后通过oneflow化的Real-ESRGAN-colab做高清。 结果单独使用OneFlowStableDiffusionPipeline或者Real-ESRGAN都没有问题。 然而两个模型同时load到显卡中的时候,使用Real-ESRGAN模型就会报如下错误
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_1752/2202719188.py in <module>
3 image = Image.open(path_to_image).convert('RGB')
4
----> 5 sr_image = realesr_model.predict(image)
6
7 sr_image
~/diffusion_server/oneflowrealesr/oneflowrealesrgan.py in predict(self, lr_image, batch_size, patches_size, padding, pad_size)
39 with torch.autocast("cuda"):
40 with torch.no_grad():
---> 41 res = self.model(img[0:batch_size])
42 for i in range(batch_size, img.shape[0], batch_size):
43 res = torch.cat((res, self.model(img[i:i+batch_size])), 0)
~/.local/lib/python3.7/site-packages/oneflow/nn/module.py in __call__(self, *args, **kwargs)
156 args = result
157
--> 158 res = self.forward(*args, **kwargs)
159
160 for hook in itertools.chain(self._forward_hooks.values()):
~/diffusion_server/oneflowrealesr/oneflowrrdbnet_arch.py in forward(self, x)
111 feat = x
112 feat = self.conv_first(feat)
--> 113 body_feat = self.conv_body(self.body(feat))
114 feat = feat + body_feat
115 # upsample
~/.local/lib/python3.7/site-packages/oneflow/nn/module.py in __call__(self, *args, **kwargs)
156 args = result
157
--> 158 res = self.forward(*args, **kwargs)
159
160 for hook in itertools.chain(self._forward_hooks.values()):
~/.local/lib/python3.7/site-packages/oneflow/nn/utils/container.py in forward(self, input)
97 def forward(self, input):
98 for module in self:
---> 99 input = module(input)
100 return input
101
~/.local/lib/python3.7/site-packages/oneflow/nn/module.py in __call__(self, *args, **kwargs)
156 args = result
157
--> 158 res = self.forward(*args, **kwargs)
159
160 for hook in itertools.chain(self._forward_hooks.values()):
~/diffusion_server/oneflowrealesr/oneflowrrdbnet_arch.py in forward(self, x)
56
57 def forward(self, x):
---> 58 out = self.rdb1(x)
59 out = self.rdb2(out)
60 out = self.rdb3(out)
~/.local/lib/python3.7/site-packages/oneflow/nn/module.py in __call__(self, *args, **kwargs)
156 args = result
157
--> 158 res = self.forward(*args, **kwargs)
159
160 for hook in itertools.chain(self._forward_hooks.values()):
~/diffusion_server/oneflowrealesr/oneflowrrdbnet_arch.py in forward(self, x)
30
31 def forward(self, x):
---> 32 x1 = self.lrelu(self.conv1(x))
33 x2 = self.lrelu(self.conv2(torch.cat((x, x1), 1)))
34 x3 = self.lrelu(self.conv3(torch.cat((x, x1, x2), 1)))
~/.local/lib/python3.7/site-packages/oneflow/nn/module.py in __call__(self, *args, **kwargs)
156 args = result
157
--> 158 res = self.forward(*args, **kwargs)
159
160 for hook in itertools.chain(self._forward_hooks.values()):
~/.local/lib/python3.7/site-packages/oneflow/nn/modules/activation.py in forward(self, x)
860
861 def forward(self, x):
--> 862 return flow._C.leaky_relu(x, alpha=self.negative_slope, inplace=self.inplace)
863
864 def extra_repr(self):
RuntimeError: Check failed: tensor_impl->tensor_meta()->dtype() == output_tensor_metas.at(i)->dtype()
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/functional/impl/activation_functor.cpp", line 496, in operator()
OpInterpUtil::Dispatch(*op_, {x}, outputs.get(), attrs)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter.cpp", line 96, in Apply
internal_->Apply(op_expr, inputs, outputs, ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_local_op_interpreter.cpp", line 123, in NaiveInterpret
Error Type: oneflow.ErrorProto.check_failed_error
然后试了一下sd模型load进显存,但是不跑。本地Image.open一张图让Real-ESRGAN模型跑。仍然还是报这个错误。 示例代码如下:
import oneflow as torch
from diffusers import OneFlowStableDiffusionPipeline
pipe = OneFlowStableDiffusionPipeline.from_pretrained(path)
pipe = pipe.to("cuda:1")
prompt = "1 girl"
with torch.autocast("cuda"):
image=pipe(prompt_scale=7.5)["sample"][0]
from oneflowrealesrgan import RealESRGAN
realesr_model = RealESRGAN("cuda:1", scale=2)
realesr_model.load_weights('xxxx/RealESRGAN/x2')
from PIL import Image
"""
path_to_image = '../1.png'
image = Image.open(path_to_image).convert('RGB')
"""
sr_image = realesr_model.predict(image)
无论是自己生成的image还是直接Image.open的image都会报错 但是两段代码分开单独跑都是ok的。
你好,您对Real-ESRGAN-colab做的修改只是import oneflow as torch,没有其他的修改对吗?我们复现一下您的问题。 或者您也可以直接将您运行的代码push到您的账户下。
我使用Real-ESRGAN-colab的相关代码如下,我自己转换成oneflow的模型在weights/RealESRGANx2.zip的压缩包里。 直接解压缩,然后修改run.py里的模型路径就可以跑。这个模型是2倍放大的。或者你们可以下载Real-ESRGAN-colab里的原版模型。使用我代码里的torch2oneflow.py来自行转换。 如上面所说,这份代码单独跑是没问题的,但是如果同时在代码里载入OneFlowStableDiffusionPipeline的模型就不行了。 代码如下: https://github.com/yaoxinthu/oneflowrealesr 十分感谢
好的,我来复现一下您遇到的问题,谢谢您的反馈。
为了方便您复现,我这边提供了一个简单的复现代码。 https://github.com/yaoxinthu/oneflowrealesr/blob/master/error_run.py 只要您的系统上有sdv1.4并且确认一下代码第14行的RealESRGAN_path是否正确。就可以复现我遇到的错误
你好,感谢您的反馈,这个问题的原因是OneFlowStableDiffusionPipeline中的一个优化对您后面使用的那个模型造成了影响,您在您的代码from diffusers import OneFlowStableDiffusionPipeline之后通过os.environ["ONEFLOW_KERNEL_ENABLE_CUDNN_FUSED_CONV_BIAS"] = "0"关闭该优化即可解决您遇到的问题。
我们会在后续的开发中对该优化以更妥善的方式进行处理。感谢您的反馈。
解决了,十分感谢。期待后续版本的优化!