PaddleHub icon indicating copy to clipboard operation
PaddleHub copied to clipboard

OSError: (External) Cublas error, `CUBLAS_STATUS_INVALID_VALUE`

Open jeffzhengye opened this issue 3 years ago • 9 comments

Since tensorflow is running very well on my machine, I assume the env is configured correctly. What could be the problems and how I may fix it? results = self.detector.object_detection(images = [ori_img], use_gpu = self.use_gpu, score_thresh = self.threshold, visualization=False) File "/home/user/anaconda3/lib/python3.7/site-packages/paddlehub/compat/paddle_utils.py", line 220, in runner return func(*args, **kwargs) File "/home/user/.paddlehub/modules/yolov3_resnet50_vd_coco2017/module.py", line 211, in object_detection [image_tensor, im_size_tensor]) OSError: (External) Cublas error,CUBLAS_STATUS_INVALID_VALUE. An unsupported value or parameter was passed to the function (a negative vector size, for example). (at /paddle/paddle/fluid/platform/cuda_helper.h:93)

jeffzhengye avatar Mar 22 '21 07:03 jeffzhengye

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

paddle-bot-old[bot] avatar Mar 22 '21 07:03 paddle-bot-old[bot]

这是paddlehub的问题应该。 result = predictor.object_detection(images=[cv2.imread('best-face-oil.png')], use_gpu=True)

AttributeError: 'YOLOv3ResNet50Coco2017' object has no attribute 'gpu_predictor'

不知道为什么不能使用gpu预测

jeffzhengye avatar Mar 22 '21 09:03 jeffzhengye

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

Hi developer, I met the same issue today. Here is my Log. My paddle version is 2.0.1.post110

2021-03-22 22:45:35,513 - INFO - Found /home/bob/anaconda3/envs/dev/lib/python3.7/site-packages/ppgan/apps/photo2cartoon_genA2B_weight.pdparams
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-3-6da9bba74531> in <module>
----> 1 a = apps.Photo2CartoonPredictor()

~/anaconda3/envs/dev/lib/python3.7/site-packages/ppgan/apps/photo2cartoon_predictor.py in __init__(self, output_path, weight_path)
     41             weight_path = get_path_from_url(P2C_WEIGHT_URL, cur_path)
     42 
---> 43         self.genA2B = ResnetUGATITP2CGenerator()
     44         params = paddle.load(weight_path)
     45         self.genA2B.set_state_dict(params)

~/anaconda3/envs/dev/lib/python3.7/site-packages/ppgan/models/generators/resnet_ugatit_p2c.py in __init__(self, input_nc, output_nc, ngf, img_size, n_blocks, light)
     41         DownBlock += [
     42             nn.Pad2D([3, 3, 3, 3], 'reflect'),
---> 43             nn.Conv2D(input_nc, ngf, kernel_size=7, stride=1, bias_attr=False),
     44             nn.InstanceNorm2D(ngf, weight_attr=False, bias_attr=False),
     45             nn.ReLU()

~/anaconda3/envs/dev/lib/python3.7/site-packages/paddle/nn/layer/conv.py in __init__(self, in_channels, out_channels, kernel_size, stride, padding, dilation, groups, padding_mode, weight_attr, bias_attr, data_format)
    637             weight_attr=weight_attr,
    638             bias_attr=bias_attr,
--> 639             data_format=data_format)
    640 
    641     def forward(self, x):

~/anaconda3/envs/dev/lib/python3.7/site-packages/paddle/nn/layer/conv.py in __init__(self, in_channels, out_channels, kernel_size, transposed, dims, stride, padding, padding_mode, output_padding, dilation, groups, weight_attr, bias_attr, data_format)
    133             shape=filter_shape,
    134             attr=self._param_attr,
--> 135             default_initializer=_get_default_param_initializer())
    136         self.bias = self.create_parameter(
    137             attr=self._bias_attr, shape=[self._out_channels], is_bias=True)

~/anaconda3/envs/dev/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in create_parameter(self, shape, attr, dtype, is_bias, default_initializer)
    406             temp_attr = None
    407         return self._helper.create_parameter(temp_attr, shape, dtype, is_bias,
--> 408                                              default_initializer)
    409 
    410     @deprecated(

~/anaconda3/envs/dev/lib/python3.7/site-packages/paddle/fluid/layer_helper_base.py in create_parameter(self, attr, shape, dtype, is_bias, default_initializer, stop_gradient, type)
    370                 type=type,
    371                 stop_gradient=stop_gradient,
--> 372                 **attr._to_kwargs(with_initializer=True))
    373         else:
    374             self.startup_program.global_block().create_parameter(

~/anaconda3/envs/dev/lib/python3.7/site-packages/paddle/fluid/framework.py in create_parameter(self, *args, **kwargs)
   2982                 pass
   2983             else:
-> 2984                 initializer(param, self)
   2985         return param
   2986 

~/anaconda3/envs/dev/lib/python3.7/site-packages/paddle/fluid/initializer.py in __call__(self, var, block)
    362                 "use_mkldnn": False
    363             },
--> 364             stop_gradient=True)
    365 
    366         if var.dtype == VarDesc.VarType.FP16:

~/anaconda3/envs/dev/lib/python3.7/site-packages/paddle/fluid/framework.py in _prepend_op(self, *args, **kwargs)
   3098                                        kwargs.get("outputs", {}), attrs
   3099                                        if attrs else {},
-> 3100                                        kwargs.get("stop_gradient", False))
   3101         else:
   3102             op_desc = self.desc._prepend_op()

~/anaconda3/envs/dev/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py in trace_op(self, type, inputs, outputs, attrs, stop_gradient)
     43         self.trace(type, inputs, outputs, attrs,
     44                    framework._current_expected_place(), self._has_grad and
---> 45                    not stop_gradient)
     46 
     47     def train_mode(self):

OSError: (External)  Cublas error, `CUBLAS_STATUS_INVALID_VALUE`. An unsupported value or parameter was passed to the function (a negative vector size, for example).  (at /paddle/paddle/fluid/platform/cuda_helper.h:93)
  [operator < gaussian_random > error]`

Bob-AFei avatar Mar 22 '21 14:03 Bob-AFei

Hi all,

just for a memo, PyTorch is running well on my machine too.

By the way, I found Pr. Ye and I both used python3.7 or the conda environment. May it lead to this issue? Professor Ye, did you try docker to run this code before?

At last, many thanks. : )

At 2021-03-23 19:39:14, "Jeffery Ye" @.***> wrote:

Since tensorflow is running very well on my machine, I assume the env is configured correctly. What could be the problems and how I may fix it? results = self.detector.object_detection(images = [ori_img], use_gpu = self.use_gpu, score_thresh = self.threshold, visualization=False) File "/home/user/anaconda3/lib/python3.7/site-packages/paddlehub/compat/paddle_utils.py", line 220, in runner return func(*args, **kwargs) File "/home/user/.paddlehub/modules/yolov3_resnet50_vd_coco2017/module.py", line 211, in object_detection [image_tensor, im_size_tensor]) OSError: (External) Cublas error,CUBLAS_STATUS_INVALID_VALUE. An unsupported value or parameter was passed to the function (a negative vector size, for example). (at /paddle/paddle/fluid/platform/cuda_helper.h:93)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Bob-AFei avatar Mar 23 '21 14:03 Bob-AFei

请问您用的paddle,paddlehub及python,cudnn和cuda 的版本是什么呢? 我验证时候使用的是paddlepaddle-gpu=2.0.1, paddlehub=2.0.0, cuda=10.2, cudnn=7.6.5, python=3.8.5,没有出现您说的问题。验证脚本及验证方法:

import paddlehub as hub
import cv2

object_detector = hub.Module(name="yolov3_resnet50_vd_coco2017")
result = object_detector.object_detection(images=[cv2.imread('2007_000175.jpg')],use_gpu=True)
print(result)

在命令行输入CUDA_VISIBLE_DEVICES=0 python test.py

目前您存在的问题初步判定有两个方面,环境配置有问题,paddle和cuda的版本及cudnn版本不匹配,第二种是在执行时候没有设定好CUDA_VISIBLE_DEVICES

haoyuying avatar Mar 24 '21 02:03 haoyuying

This problem is really due to the version mismatching. The version number cannot be higher or lower with even a minor version number. Read its doc very carefully and do as it says exactly since the doc is not very well written.

Overall, it's much harder than that of tensorflow for the configuration. I believe its the platform's responsibility to output more useful information for debug (I guess it's easy to check the mismatching).

jeffzhengye avatar Apr 02 '21 06:04 jeffzhengye

This problem is really due to the version mismatching. The version number cannot be higher or lower with even a minor version number. Read its doc very carefully and do as it says exactly since the doc is not very well written.

Overall, it's much harder than that of tensorflow for the configuration. I believe its the platform's responsibility to output more useful information for debug (I guess it's easy to check the mismatching).

Although this problem has been fixed, this thread cannot provide any useful information for other users. It makes paddle hard to use when you get a problem.

jeffzhengye avatar Apr 02 '21 06:04 jeffzhengye

I meet a same problem:

2022-04-29 18:05:40 [INFO] Starting to read file list from dataset... 2022-04-29 18:05:40 [INFO] 1981 samples in file MyDataset/train.json, including 1981 positive samples and 0 negative samples. loading annotations into memory... Done (t=0.07s) creating index... index created! 2022-04-29 18:05:40 [INFO] Starting to read file list from dataset... 2022-04-29 18:05:40 [INFO] 495 samples in file MyDataset/val.json, including 495 positive samples and 0 negative samples. W0429 18:05:40.970075 12979 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 11.0 W0429 18:05:40.975056 12979 device_context.cc:422] device: 0, cuDNN Version: 7.6. Traceback (most recent call last): File "train.py", line 38, in model = pdx.det.FasterRCNN(num_classes=num_classes, backbone='ResNet101') File "/root/.local/lib/python3.7/site-packages/paddlex/cv/models/detector.py", line 854, in init dcn_v2_stages=dcn_v2_stages) File "/root/.local/lib/python3.7/site-packages/paddlex/cv/models/detector.py", line 100, in _get_backbone backbone = getattr(ppdet.modeling, backbone_name)(**params) File "/root/.local/lib/python3.7/site-packages/paddlex/ppdet/modeling/backbones/resnet.py", line 528, in init lr=1.0)) File "/root/.local/lib/python3.7/site-packages/paddlex/ppdet/modeling/backbones/resnet.py", line 69, in init bias_attr=False) File "/root/.local/lib/python3.7/site-packages/paddle/nn/layer/conv.py", line 646, in init data_format=data_format) File "/root/.local/lib/python3.7/site-packages/paddle/nn/layer/conv.py", line 135, in init default_initializer=_get_default_param_initializer()) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 412, in create_parameter default_initializer) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/layer_helper_base.py", line 374, in create_parameter **attr._to_kwargs(with_initializer=True)) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2895, in create_parameter initializer(param, self) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/initializer.py", line 366, in call stop_gradient=True) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2925, in append_op kwargs.get("stop_gradient", False)) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 45, in trace_op not stop_gradient) OSError: (External) Cublas error, CUBLAS_STATUS_INVALID_VALUE. An unsupported value or parameter was passed to the function (a negative vector size, for example). (at /paddle/paddle/fluid/platform/cuda_helper.h:107)

Maojianzeng avatar Apr 29 '22 10:04 Maojianzeng

I meet a same problem:

2022-04-29 18:05:40 [INFO] Starting to read file list from dataset... 2022-04-29 18:05:40 [INFO] 1981 samples in file MyDataset/train.json, including 1981 positive samples and 0 negative samples. loading annotations into memory... Done (t=0.07s) creating index... index created! 2022-04-29 18:05:40 [INFO] Starting to read file list from dataset... 2022-04-29 18:05:40 [INFO] 495 samples in file MyDataset/val.json, including 495 positive samples and 0 negative samples. W0429 18:05:40.970075 12979 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 11.0 W0429 18:05:40.975056 12979 device_context.cc:422] device: 0, cuDNN Version: 7.6. Traceback (most recent call last): File "train.py", line 38, in model = pdx.det.FasterRCNN(num_classes=num_classes, backbone='ResNet101') File "/root/.local/lib/python3.7/site-packages/paddlex/cv/models/detector.py", line 854, in init dcn_v2_stages=dcn_v2_stages) File "/root/.local/lib/python3.7/site-packages/paddlex/cv/models/detector.py", line 100, in _get_backbone backbone = getattr(ppdet.modeling, backbone_name)(**params) File "/root/.local/lib/python3.7/site-packages/paddlex/ppdet/modeling/backbones/resnet.py", line 528, in init lr=1.0)) File "/root/.local/lib/python3.7/site-packages/paddlex/ppdet/modeling/backbones/resnet.py", line 69, in init bias_attr=False) File "/root/.local/lib/python3.7/site-packages/paddle/nn/layer/conv.py", line 646, in init data_format=data_format) File "/root/.local/lib/python3.7/site-packages/paddle/nn/layer/conv.py", line 135, in init default_initializer=_get_default_param_initializer()) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 412, in create_parameter default_initializer) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/layer_helper_base.py", line 374, in create_parameter **attr._to_kwargs(with_initializer=True)) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2895, in create_parameter initializer(param, self) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/initializer.py", line 366, in call stop_gradient=True) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2925, in append_op kwargs.get("stop_gradient", False)) File "/root/.local/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 45, in trace_op not stop_gradient) OSError: (External) Cublas error, CUBLAS_STATUS_INVALID_VALUE. An unsupported value or parameter was passed to the function (a negative vector size, for example). (at /paddle/paddle/fluid/platform/cuda_helper.h:107)

I solve the problem by: Successfully uninstalled paddlepaddle-gpu-2.1.2.post110 Successfully installed paddlepaddle-gpu-2.1.2.post101 Maybe the project is suit for cuda10.1, not 11.0.

Maojianzeng avatar Apr 29 '22 10:04 Maojianzeng