Tensile icon indicating copy to clipboard operation
Tensile copied to clipboard

hipErrorSharedObjectInitFailed when testing default example

Open flint-stone opened this issue 3 years ago • 8 comments

Hello! I was trying to test Tensile based on example provided at this link and running the example based on ../Tensile/bin/Tensile ../Tensile/Configs/rocblas_sgemm_asm_only.yaml ./ but I'm getting an error saying hipErrorSharedObjectInitFailed. Here is a detail of this error:

Compiling source kernels: Done.
# Kernel Building elapsed time = 950.7 secs
# Actual Solutions: 192 / 192 after KernelWriter
+ set +e
+ ERR1=0
+ /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/0_Build/client/tensile_client --config-file /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/../source/ClientParameters.ini
loading config file /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/../source/ClientParameters.ini
Loading /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/source/library/Kernels.so-000-gfx1012.hsaco
terminate called after throwing an instance of ‘std::runtime_error’
  what():  Error 303(hipErrorSharedObjectInitFailed) /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/Source/client/main.cpp:323:
retError
hipErrorSharedObjectInitFailed
/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/run.sh: line 6:  1976 Aborted                 (core dumped) /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/0_Build/client/tensile_client --config-file /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/../source/ClientParameters.ini
+ ERR2=134
+ ERR=0
+ [[ 0 -ne 0 ]]
+ [[ 134 -ne 0 ]]
+ echo two
two
+ ERR=134
+ exit 134
Tensile::warning: ClientWriter Benchmark Process exited with code 134
Tensile::warning: BenchmarkProblems: Benchmark Process exited with code 134
################################################################################
# Cijk_Ailk_Bljk_SB_00
# 00_Final: End - 965.701s
################################################################################
clientExit=1 (ERROR) for /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/Configs/test.yaml
Traceback (most recent call last):
  File “../Tensile/bin/Tensile”, line 36, in <module>
    Tensile.main()
  File “/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/Tensile.py”, line 282, in main
    Tensile(sys.argv[1:])
  File “/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/Tensile.py”, line 239, in Tensile
    executeStepsInConfig(config)
  File “/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/Tensile.py”, line 51, in executeStepsInConfig
    BenchmarkProblems.main( config[“BenchmarkProblems”] )
  File “/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/BenchmarkProblems.py”, line 366, in main
    shutil.copy( resultsFileName, newResultsFileName )
  File “/usr/lib/python3.6/shutil.py”, line 245, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File “/usr/lib/python3.6/shutil.py”, line 120, in copyfile
    with open(src, ‘rb’) as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/Data/00_Final.csv’

Any suggestions on what could be the problem here? Thanks in advance!

flint-stone avatar Mar 08 '22 20:03 flint-stone

Here is some more detailed information about the environment we are trying.

ROCm version: 5.0.0 GPUs: Radeon VII + RX5500XT, but we only care about the Radeon VII. Tensile Version: The current commit on the master branch. Commit ID is d5eea38

Let me know if you need more information.

syifan avatar Mar 09 '22 18:03 syifan

Thanks for filing the issue. Attached here, please find the updated config file. There are some obsolete parameters in the original config file that results in this error.

Please let me know if that solves the problem.

babakpst avatar Mar 10 '22 23:03 babakpst

I will update the sample Configs files in my next PR.

babakpst avatar Mar 10 '22 23:03 babakpst

@flint-stone @syifan

babakpst avatar Mar 11 '22 23:03 babakpst

Hi @babakpst -- thanks for letting us know. I tried the new configuration file and it seems I'm still getting the similar error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Error 303(hipErrorSharedObjectInitFailed) /home/lexu/Tensile/repo/Tensile/Source/client/main.cpp:323: 
retError
hipErrorSharedObjectInitFailed

/home/lexu/Tensile/repo/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/run.sh: line 6: 28243 Aborted                 (core dumped) /home/lexu/Tensile/repo/build/0_Build/client/tensile_client --config-file /home/lexu/Tensile/repo/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/../source/ClientParameters.ini
+ ERR2=134
+ ERR=0
+ [[ 0 -ne 0 ]]
+ [[ 134 -ne 0 ]]
+ echo two
two
+ ERR=134
+ exit 134
Tensile::WARNING: ClientWriter Benchmark Process exited with code 134
Tensile::WARNING: BenchmarkProblems: Benchmark Process exited with code 134
################################################################################
# Cijk_Ailk_Bljk_SB_00
# 00_Final: End - 172.577s
################################################################################

clientExit=1 (ERROR) for /home/lexu/rocblas_sgemm_asm_only_ChangeMyExtensionTo_yaml.txt
Traceback (most recent call last):
  File "../Tensile/bin/Tensile", line 36, in <module>
    Tensile.main()
  File "/home/lexu/Tensile/repo/Tensile/Tensile.py", line 282, in main
    Tensile(sys.argv[1:])
  File "/home/lexu/Tensile/repo/Tensile/Tensile.py", line 239, in Tensile
    executeStepsInConfig(config)
  File "/home/lexu/Tensile/repo/Tensile/Tensile.py", line 51, in executeStepsInConfig
    BenchmarkProblems.main( config["BenchmarkProblems"] )
  File "/home/lexu/Tensile/repo/Tensile/BenchmarkProblems.py", line 366, in main
    shutil.copy( resultsFileName, newResultsFileName )
  File "/usr/lib/python3.6/shutil.py", line 245, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.6/shutil.py", line 120, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/home/lexu/Tensile/repo/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/Data/00_Final.csv'

I simply replaced the yaml file from the instruction with the new file. Let me know if I need to provide more information.

Thanks.

flint-stone avatar Mar 15 '22 03:03 flint-stone

Hi @flint-stone and sorry for the late reply. I ran that yaml file on a couple of newer architectures and did not get any error messages. I managed to find a Radeon VII node and am updating that node so that I can run Tensile there. It has been some time since we tuned Tensile for that architecture. There might be some other parameters in the yaml file that are not compatible with Radeon VII architecture. I will update you once I can run Tensile on Radeon VII. Thanks for your patience.

babakpst avatar Mar 25 '22 22:03 babakpst