open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

"NVRM RmInitAdapter: Cannot initialize GSP firmware RM" error found

Open jacksonsshen opened this issue 1 year ago • 1 comments

NVIDIA Open GPU Kernel Modules Version

520.56.06

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • [ ] I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Ubuntu 20.04.6 LTS

Kernel Release

5.10.14

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • [X] I am running on a stable kernel release.

Hardware: GPU

NVIDIA GeForce RTX 3080

Describe the bug

We have deployed a ubuntu machine with an Open GPU Kernel Modules 520 nvidia driver. But the machine often has some exceptions. The error is as follows:

NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa 2024-07-02 18:46:08.681559 kernel:[ 21.766727] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff 2024-07-02 18:46:08.681562 kernel:[ 21.766734] NVRM nvAssertFailedNoLog: Assertion failed: rmStatus == NV_OK @ osinit.c:1982

ecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164 [ 1731.589314] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235 [ 1731.589317] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff [ 1731.589323] NVRM RmInitAdapter: Cannot initialize GSP firmware RM [ 1731.591779] NVRM: GPU 0000:86:00.0: RmInitAdapter failed! (0x63:0xffff:1684) [ 1731.593977] NVRM: GPU 0000:86:00.0: rm_init_adapter failed, device minor number 0 [ 1731.777872] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa [ 1731.777876] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff [ 1731.800951] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe [ 1731.800957] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164 [ 1731.800963] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235 [ 1731.800965] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff [ 1731.800970] NVRM RmInitAdapter: Cannot initialize GSP firmware RM [ 1731.803388] NVRM: GPU 0000:af:00.0: RmInitAdapter failed! (0x63:0xffff:1684) [ 1731.805517] NVRM: GPU 0000:af:00.0: rm_init_adapter failed, device minor number 1 [ 1731.989155] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa [ 1731.989160] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff [ 1732.012716] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe [ 1732.012722] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164

To Reproduce

Using 520.56.06 open-source nvidia driver and starting the machine

Bug Incidence

Sometimes

nvidia-bug-report.log.gz

NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa 2024-07-02 18:46:08.681559 kernel:[ 21.766727] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff 2024-07-02 18:46:08.681562 kernel:[ 21.766734] NVRM nvAssertFailedNoLog: Assertion failed: rmStatus == NV_OK @ osinit.c:1982

ecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164 [ 1731.589314] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235 [ 1731.589317] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff [ 1731.589323] NVRM RmInitAdapter: Cannot initialize GSP firmware RM [ 1731.591779] NVRM: GPU 0000:86:00.0: RmInitAdapter failed! (0x63:0xffff:1684) [ 1731.593977] NVRM: GPU 0000:86:00.0: rm_init_adapter failed, device minor number 0 [ 1731.777872] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa [ 1731.777876] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff [ 1731.800951] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe [ 1731.800957] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164 [ 1731.800963] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235 [ 1731.800965] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff [ 1731.800970] NVRM RmInitAdapter: Cannot initialize GSP firmware RM [ 1731.803388] NVRM: GPU 0000:af:00.0: RmInitAdapter failed! (0x63:0xffff:1684) [ 1731.805517] NVRM: GPU 0000:af:00.0: rm_init_adapter failed, device minor number 1 [ 1731.989155] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa [ 1731.989160] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff [ 1732.012716] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe [ 1732.012722] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164

More Info

No response

jacksonsshen avatar Jul 10 '24 14:07 jacksonsshen

I think, you should try this also with newer versions, since 520 is not supported anymore.

There are:

  • 535 Production Stable
  • 550 Stable
  • 555 New Feature

Branches.

ptr1337 avatar Jul 10 '24 15:07 ptr1337

7 RmInitAdapter: Cannot initialize GSP firmware RM 6 [ 387.346751] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x56:1993) 5 [ 387.355516] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 4 [ 387.562971] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kernel_sec2_nvoc.h:792 3 [ 387.562991] NVRM: nvAssertFailedNoLog: Assertion failed: pBinArchive != NULL @ kernel_gsp_booter.c:487 2 [ 387.562998] NVRM: nvCheckOkFailedNoLog: Check failed: Call not supported [NV_ERR_NOT_SUPPORTED] (0x00000056) returned from kgspAllocateScrubberUcodeImage(pGpu, p KernelGsp, &pKernelGsp->pScrubberUcode) @ kernel_gsp.c:3486 1 [ 387.563000] NVRM: nvCheckOkFailedNoLog: Check failed: Call not supported [NV_ERR_NOT_SUPPORTED] (0x00000056) returned from _kgspPrepareScrubberImageIfNeeded(pGpu , pKernelGsp) @ kernel_gsp.c:3635

Saw this thread, I'm facing similar issues I'm using 580.65.06 I'm using a rk3588 sbc axon , with discrete rtx3080.

hrushirajg23 avatar Oct 31 '25 18:10 hrushirajg23

Dear @hrushirajg23 Thank you for reporting issue, could you please help to generate bug report in repro state and attach for triage purpose.

amrit1711 avatar Nov 10 '25 06:11 amrit1711

@amrit1711

  1. the open driver -

open-nvidia-bug-report.log.gz

  1. proprietary driver -

nvidia-bug-report.log.gz

Regarding the open - driver, I solved the chipset not recognized by adding my chip info. I'm clueless about the "Cannot initialize gsp firmware RM issue".

Do let me know if you need any more information. Thanks

hrushirajg23 avatar Nov 10 '25 07:11 hrushirajg23

Thank you, we will analyze logs and get back to you.

amrit1711 avatar Nov 10 '25 07:11 amrit1711