ppl.nn icon indicating copy to clipboard operation
ppl.nn copied to clipboard

Floating point exception (core dumped) occured when using cuda engine

Open Frozenmad opened this issue 2 years ago • 1 comments

What are the problems?(screenshots or detailed error messages)

when using cuda engine, the compilation outputs: Floating point exception (core dumped)

code snippet:

from pyppl import nn as pplnn
from pyppl import common as pplcommon

cuda_options = pplnn.cuda.EngineOptions()
cuda_engine = pplnn.cuda.EngineFactory.Create(cuda_options)
runtime_builder = pplnn.onnx.RuntimeBuilderFactory.Create()

onnx_model_file = 'model.onnx'
status = runtime_builder.LoadModelFromFile(onnx_model_file)

resources = pplnn.onnx.RuntimeBuilderResources()
resources.engines = [cuda_engine]
runtime_builder.SetResources(resources)

status = runtime_builder.Preprocess()
runtime = runtime_builder.CreateRuntime()
[INFO][2022-08-13 08:19:05.686][utils.cc:456] total partition(s) of graph[torch_jit]: 1.
[INFO][2022-08-13 08:19:05.712][opt_graph.cc:324] added 74 new bridge kernels
[INFO][2022-08-13 08:19:05.991][algo_conv_hmma.cc:137] Compiling Conv_3
[INFO][2022-08-13 08:19:16.544][algo_conv_hmma.cc:142] select kernel nvIdxnSm80Fp16Conv_hmma16816_nhwc_b128x64_w128x16_k16_s16
[INFO][2022-08-13 08:19:16.759][algo_conv_hmma.cc:137] Compiling Conv_6
Floating point exception (core dumped)

Which version(commit id or tag) of ppl.nn is used?

38289e9

What's the operating system ppl.nn runs on?

Ubuntu 18.04.6 LTS

What are the commands used to build ppl.nn?

./build.sh -DPPLNN_USE_X86_64=ON -DPPLNN_USE_CUDA=ON -DPPLNN_ENABLE_PYTHON_API=ON -DPYTHON3_INCLUDE_DIRS=/path/to/my/python3.9/include

models and inputs for reproducing these problems (send them to [email protected] if necessary)

a resnet18 model trained on cifar10. Sent to the email.

Frozenmad avatar Aug 13 '22 08:08 Frozenmad

Hi, which GPU did you use?

Si-XU avatar Aug 14 '22 07:08 Si-XU