./build/test/test_all.testbin drops core
Issue summary
I'm reposting this after closing my original issue because I am now quite confident the install was completely canonical.
Running ./build/test/test_all.testbin ultimately drops core. It drops core same place with either RX470 or RX Vega 64.
A few tests fail but at a certain point it always drops core.
Steps to reproduce
- Clean install Ubuntu 18.04.1 LTS Server.
- Use Ubuntu's stock kernels provided by only the apt-get update / upgrade mechanism (ie the kernel in use is that provided by Ubuntu after apt-get dist-upgrade: I have not installed an upstream kernel.
- Install ROCm 2.0 & hipCaffe per the hipCaffe instructions.
- Run several of the examples without error.
- Run ./build/test/test_all.testbin... drops core.
Problem:
...
[ RUN ] NetTest/2.TestForcePropagateDown
[ OK ] NetTest/2.TestForcePropagateDown (2 ms)
[ RUN ] NetTest/2.TestAllInOneNetTrain
[ OK ] NetTest/2.TestAllInOneNetTrain (3 ms)
[ RUN ] NetTest/2.TestAllInOneNetVal
[ OK ] NetTest/2.TestAllInOneNetVal (4 ms)
[ RUN ] NetTest/2.TestAllInOneNetDeploy
[ OK ] NetTest/2.TestAllInOneNetDeploy (1 ms)
[----------] 26 tests from NetTest/2 (772 ms total)
[----------] 26 tests from NetTest/3, where TypeParam = caffe::GPUDevice<double>
[ RUN ] NetTest/3.TestHasBlob
[ OK ] NetTest/3.TestHasBlob (4 ms)
[ RUN ] NetTest/3.TestGetBlob
[ OK ] NetTest/3.TestGetBlob (4 ms)
[ RUN ] NetTest/3.TestHasLayer
[ OK ] NetTest/3.TestHasLayer (4 ms)
[ RUN ] NetTest/3.TestGetLayerByName
[ OK ] NetTest/3.TestGetLayerByName (4 ms)
[ RUN ] NetTest/3.TestBottomNeedBackward
[ OK ] NetTest/3.TestBottomNeedBackward (4 ms)
[ RUN ] NetTest/3.TestBottomNeedBackwardForce
[ OK ] NetTest/3.TestBottomNeedBackwardForce (4 ms)
[ RUN ] NetTest/3.TestBottomNeedBackwardEuclideanForce
[ OK ] NetTest/3.TestBottomNeedBackwardEuclideanForce (1 ms)
[ RUN ] NetTest/3.TestBottomNeedBackwardTricky
[ OK ] NetTest/3.TestBottomNeedBackwardTricky (5 ms)
[ RUN ] NetTest/3.TestLossWeight
[ OK ] NetTest/3.TestLossWeight (21 ms)
[ RUN ] NetTest/3.TestLossWeightMidNet
[ OK ] NetTest/3.TestLossWeightMidNet (16 ms)
[ RUN ] NetTest/3.TestComboLossWeight
[ OK ] NetTest/3.TestComboLossWeight (18 ms)
[ RUN ] NetTest/3.TestBackwardWithAccuracyLayer
MIOpen Error: /home/dlowell/MIOpenPrivate/src/ocl/softmaxocl.cpp:59: Only alpha=1 and beta=0 is supported
F0116 04:57:34.752313 24321 cudnn_softmax_layer_hip.cpp:27] Check failed: status == miopenStatusSuccess (7 vs. 0) miopenStatusUnknownError
*** Check failure stack trace: ***
@ 0x7f2ab9f720cd google::LogMessage::Fail()
@ 0x7f2ab9f73f33 google::LogMessage::SendToLog()
@ 0x7f2ab9f71c28 google::LogMessage::Flush()
@ 0x7f2ab9f74999 google::LogMessageFatal::~LogMessageFatal()
@ 0x15364ce caffe::CuDNNSoftmaxLayer<>::Forward_gpu()
@ 0x4cb540 caffe::Layer<>::Forward()
@ 0x1ea2073 caffe::SoftmaxWithLossLayer<>::Forward_gpu()
@ 0x4cb540 caffe::Layer<>::Forward()
@ 0x1b459d7 caffe::Net<>::ForwardFromTo()
@ 0x1b458f0 caffe::Net<>::Forward()
@ 0x967796 caffe::NetTest_TestBackwardWithAccuracyLayer_Test<>::TestBody()
@ 0x108be34 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x108bcf6 testing::Test::Run()
@ 0x108ceb1 testing::TestInfo::Run()
@ 0x108d5c7 testing::TestCase::Run()
@ 0x1093967 testing::internal::UnitTestImpl::RunAllTests()
@ 0x10933a4 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x1093359 testing::UnitTest::Run()
@ 0x201545a main
@ 0x7f2ab4b00b97 __libc_start_main
@ 0x20148fa _start
Aborted (core dumped)
Your system configuration
I can provide what ever info you need, just tell me what you want.
Operating system: Ubuntu 18.04.1 LTS Compiler: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 CUDA version (if applicable): N/A CUDNN version (if applicable): N/A BLAS: rocblas 2.0.0.0 Python or MATLAB version (for pycaffe and matcaffe respectively): Python 2.7.15rc1
Hardware: RX Vega 64, or RX 470 4GB. Ryzen 5 2600X 16 GB RAM X470 mobo SR-IOV is turned off IOMMU enabled or disabled - same result.