hipCaffe icon indicating copy to clipboard operation
hipCaffe copied to clipboard

./build/test/test_all.testbin drops core

Open emerth opened this issue 6 years ago • 0 comments

Issue summary

I'm reposting this after closing my original issue because I am now quite confident the install was completely canonical.

Running ./build/test/test_all.testbin ultimately drops core. It drops core same place with either RX470 or RX Vega 64.

A few tests fail but at a certain point it always drops core.

Steps to reproduce

  • Clean install Ubuntu 18.04.1 LTS Server.
  • Use Ubuntu's stock kernels provided by only the apt-get update / upgrade mechanism (ie the kernel in use is that provided by Ubuntu after apt-get dist-upgrade: I have not installed an upstream kernel.
  • Install ROCm 2.0 & hipCaffe per the hipCaffe instructions.
  • Run several of the examples without error.
  • Run ./build/test/test_all.testbin... drops core.

Problem:

...
[ RUN      ] NetTest/2.TestForcePropagateDown
[       OK ] NetTest/2.TestForcePropagateDown (2 ms)
[ RUN      ] NetTest/2.TestAllInOneNetTrain
[       OK ] NetTest/2.TestAllInOneNetTrain (3 ms)
[ RUN      ] NetTest/2.TestAllInOneNetVal
[       OK ] NetTest/2.TestAllInOneNetVal (4 ms)
[ RUN      ] NetTest/2.TestAllInOneNetDeploy
[       OK ] NetTest/2.TestAllInOneNetDeploy (1 ms)
[----------] 26 tests from NetTest/2 (772 ms total)

[----------] 26 tests from NetTest/3, where TypeParam = caffe::GPUDevice<double>
[ RUN      ] NetTest/3.TestHasBlob
[       OK ] NetTest/3.TestHasBlob (4 ms)
[ RUN      ] NetTest/3.TestGetBlob
[       OK ] NetTest/3.TestGetBlob (4 ms)
[ RUN      ] NetTest/3.TestHasLayer
[       OK ] NetTest/3.TestHasLayer (4 ms)
[ RUN      ] NetTest/3.TestGetLayerByName
[       OK ] NetTest/3.TestGetLayerByName (4 ms)
[ RUN      ] NetTest/3.TestBottomNeedBackward
[       OK ] NetTest/3.TestBottomNeedBackward (4 ms)
[ RUN      ] NetTest/3.TestBottomNeedBackwardForce
[       OK ] NetTest/3.TestBottomNeedBackwardForce (4 ms)
[ RUN      ] NetTest/3.TestBottomNeedBackwardEuclideanForce
[       OK ] NetTest/3.TestBottomNeedBackwardEuclideanForce (1 ms)
[ RUN      ] NetTest/3.TestBottomNeedBackwardTricky
[       OK ] NetTest/3.TestBottomNeedBackwardTricky (5 ms)
[ RUN      ] NetTest/3.TestLossWeight
[       OK ] NetTest/3.TestLossWeight (21 ms)
[ RUN      ] NetTest/3.TestLossWeightMidNet
[       OK ] NetTest/3.TestLossWeightMidNet (16 ms)
[ RUN      ] NetTest/3.TestComboLossWeight
[       OK ] NetTest/3.TestComboLossWeight (18 ms)
[ RUN      ] NetTest/3.TestBackwardWithAccuracyLayer
MIOpen Error: /home/dlowell/MIOpenPrivate/src/ocl/softmaxocl.cpp:59: Only alpha=1 and beta=0 is supported
F0116 04:57:34.752313 24321 cudnn_softmax_layer_hip.cpp:27] Check failed: status == miopenStatusSuccess (7 vs. 0)  miopenStatusUnknownError
*** Check failure stack trace: ***
    @     0x7f2ab9f720cd  google::LogMessage::Fail()
    @     0x7f2ab9f73f33  google::LogMessage::SendToLog()
    @     0x7f2ab9f71c28  google::LogMessage::Flush()
    @     0x7f2ab9f74999  google::LogMessageFatal::~LogMessageFatal()
    @          0x15364ce  caffe::CuDNNSoftmaxLayer<>::Forward_gpu()
    @           0x4cb540  caffe::Layer<>::Forward()
    @          0x1ea2073  caffe::SoftmaxWithLossLayer<>::Forward_gpu()
    @           0x4cb540  caffe::Layer<>::Forward()
    @          0x1b459d7  caffe::Net<>::ForwardFromTo()
    @          0x1b458f0  caffe::Net<>::Forward()
    @           0x967796  caffe::NetTest_TestBackwardWithAccuracyLayer_Test<>::TestBody()
    @          0x108be34  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x108bcf6  testing::Test::Run()
    @          0x108ceb1  testing::TestInfo::Run()
    @          0x108d5c7  testing::TestCase::Run()
    @          0x1093967  testing::internal::UnitTestImpl::RunAllTests()
    @          0x10933a4  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x1093359  testing::UnitTest::Run()
    @          0x201545a  main
    @     0x7f2ab4b00b97  __libc_start_main
    @          0x20148fa  _start
Aborted (core dumped)

Your system configuration

I can provide what ever info you need, just tell me what you want.

Operating system: Ubuntu 18.04.1 LTS Compiler: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 CUDA version (if applicable): N/A CUDNN version (if applicable): N/A BLAS: rocblas 2.0.0.0 Python or MATLAB version (for pycaffe and matcaffe respectively): Python 2.7.15rc1

Hardware: RX Vega 64, or RX 470 4GB. Ryzen 5 2600X 16 GB RAM X470 mobo SR-IOV is turned off IOMMU enabled or disabled - same result.

emerth avatar Jan 16 '19 05:01 emerth