MIOpen Error: hipoc_kernel.cpp:106: Failed to launch kernel: invalid configuration argument
- When I use the t5layernorm command MIOpenDriver t5layernorm --input 128x256x512x11 -F 1 -m 0 -t 1 -i 1, I encountered this error,Details are as follows:
MIOpenDriver t5layernorm --input 128x256x512x11 -F 1 -m 0 -t 1 -i 1 MIOpen(HIP): Info2 [CheckHipnnVersion] MIOpen and HIPNN Version matching was successful MIOpen(HIP): Info [get_device_name] Raw device name: gfx906:sramecc-:xnack- MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, HIP version 5.1.24472, MIOpen version 3.2.0.0b0f297-dirty MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Target: x86_64-unknown-linux-gnu MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Thread model: posix MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] MIOpen(HIP): Info2 [GetPerfDbPathFile] inexact perf database search MIOpen(HIP): Info [Measure] RamDb::Prefetch time: 1.02762 ms MIOpen(HIP): Info [Handle] stream: 0x7221ba0, device_id: 0 MIOpen(HIP): Info2 [GetWorkspaceSizes] T5LayernormBackward: 180224 MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 738197504 at 0x2ad4ec200000 Ok MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 44 at 0x2ad483c00000 Ok MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 738197504 at 0x2ad518400000 Ok MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 67108864 at 0x2ad498c00000 Ok MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 738197504 at 0x2ad544600000 Ok MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 738197504 at 0x2ad570800000 Ok MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 44 at 0x2ad483c01000 Ok MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 180224 at 0x2ad483c02000 Ok PRNG seed: 12345678 MIOpen(HIP): Command [LogCmdT5LayerNorm] ./bin/MIOpenDriver t5layernormfp32 -n 128 -c 256 -H 512 -W 11 -F 1 -m 0 MIOpen(HIP): Info2 [GetInvoker] Returning an invoker for problem dtype1normalized_dim0outer_size1inner_size184549376 and algorithm T5LayerNormForward MIOpen(HIP): Info2 [GetFound1_0] No invokers found for dtype1normalized_dim0outer_size1inner_size184549376 MIOpen(HIP): Info [FindSolutionImpl] T5LayernormForward (not searchable) MIOpen(HIP): Info2 [SearchForSolutions] T5LayernormForward: Success. MIOpen(HIP): Info2 [PrepareInvoker] Preparing kernel: T5LayernormFwdContiguous MIOpen(HIP): Info2 [SQLiteBase] Initializing system database file "" MIOpen(HIP): Info [KernDb] database not present MIOpen(HIP): Info2 [SQLiteBase] Initializing user database file "./db3/conv2d-fp16/kdb/gfx906_64.ukdb" MIOpen(HIP): Trace [Exec] 47072929652800:PRAGMA journal_mode=WAL; MIOpen(HIP): Trace [Exec] 47072929652800:CREATE TABLE IF NOT EXISTS
kern_db(idINTEGER PRIMARY KEY ASC,kernel_nameTEXT NOT NULL,kernel_argsTEXT NOT NULL,kernel_blobBLOB NOT NULL,kernel_hashTEXT NOT NULL,uncompressed_sizeINT NOT NULL);CREATE UNIQUE INDEX IF NOT EXISTSidx_kern_dbON kern_db(kernel_name, kernel_args); MIOpen(HIP): Info2 [KernDb] Database created successfully MIOpen(HIP): Trace [Exec] 47072929652800:PRAGMA table_info(kern_db); MIOpen(HIP): Info2 [LoadBinary] Loading binary for: "MIOpenLayerNorm.cpp.o"; args: -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_BFP16=0 -DINPUT_TYPE=float -DOUTPUT_TYPE=float -DLOCAL_SIZE=256 -DMIOPEN_ELEMENTWISE_AFFINE=0 -DMIOPEN_WEIGHT_BIAS=1 -DMIOPEN_ELEMENTWISE_AFFINE_FUSED_ADD=2 -DMIOPEN_WEIGHT_BIAS_FUSED_ADD=3 -DMIOPEN_ELEMENTWISE_AFFINE_T5=4 -DMIOPEN_WEIGHT_BIAS_T5=5 -mcpu=gfx906 MIOpen(HIP): Info2 [Prepare] SELECT kernel_blob, kernel_hash, uncompressed_size FROM kern_db WHERE (kernel_name = 'MIOpenLayerNorm.cpp.o') AND (kernel_args = '-DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_BFP16=0 -DINPUT_TYPE=float -DOUTPUT_TYPE=float -DLOCAL_SIZE=256 -DMIOPEN_ELEMENTWISE_AFFINE=0 -DMIOPEN_WEIGHT_BIAS=1 -DMIOPEN_ELEMENTWISE_AFFINE_FUSED_ADD=2 -DMIOPEN_WEIGHT_BIAS_FUSED_ADD=3 -DMIOPEN_ELEMENTWISE_AFFINE_T5=4 -DMIOPEN_WEIGHT_BIAS_T5=5 -mcpu=gfx906'); MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.709541 ms MIOpen(HIP): Info2 [LoadBinary] Unable to load binary for: "MIOpenLayerNorm.cpp.o"; args: -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_BFP16=0 -DINPUT_TYPE=float -DOUTPUT_TYPE=float -DLOCAL_SIZE=256 -DMIOPEN_ELEMENTWISE_AFFINE=0 -DMIOPEN_WEIGHT_BIAS=1 -DMIOPEN_ELEMENTWISE_AFFINE_FUSED_ADD=2 -DMIOPEN_WEIGHT_BIAS_FUSED_ADD=3 -DMIOPEN_ELEMENTWISE_AFFINE_T5=4 -DMIOPEN_WEIGHT_BIAS_T5=5 -mcpu=gfx906 MIOpen(HIP): Trace [LoadProgram] HIPOCProgram MIOpenLayerNorm.cpp MIOpen(HIP): Info2 [SaveBinary] Saving binary for: "MIOpenLayerNorm.cpp.o"; args: -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_BFP16=0 -DINPUT_TYPE=float -DOUTPUT_TYPE=float -DLOCAL_SIZE=256 -DMIOPEN_ELEMENTWISE_AFFINE=0 -DMIOPEN_WEIGHT_BIAS=1 -DMIOPEN_ELEMENTWISE_AFFINE_FUSED_ADD=2 -DMIOPEN_WEIGHT_BIAS_FUSED_ADD=3 -DMIOPEN_ELEMENTWISE_AFFINE_T5=4 -DMIOPEN_WEIGHT_BIAS_T5=5 -mcpu=gfx906 MIOpen(HIP): Info2 [Prepare] INSERT OR REPLACE INTO kern_db(kernel_name, kernel_args, kernel_blob, kernel_hash, uncompressed_size) VALUES(?, ?, ?, ?, ?); MIOpen(HIP): Info2 [Measure] Db::StoreRecord time: 11.9996 ms MIOpen(HIP): Info2 [Register] Invoker registered for algorithm dtype1normalized_dim0outer_size1inner_size184549376 and solver T5LayernormForward MIOpen(HIP): Info2 [SetAsFound1_0] Solver T5LayernormForward registered as find 1.0 best for T5LayerNormForward in dtype1normalized_dim0outer_size1inner_size184549376 MIOpen(HIP): Info2 [run] kernel_name = T5LayernormFwdContiguous, global_work_dim = { 4294967296, 1, 1 }, local_work_dim = { 256, 1, 1 } MIOpen Error: hipoc_kernel.cpp:106: Failed to launch kernel: invalid configuration argument GPU Kernel Time Forward T5LayerNorm Elapsed: 0 ms Forward T5LayerNorm FAILED: 0.321133 > 1.5e-06
This seems to be because the amount of data to be processed exceeds the maximum grid size per dimension. And the calculated data is 4294967296
But when I use another set of parameters (where the calculated data amount is still greater than the limit), it doesn't show up. Why is that?
Hi @GeezYuven,
Thanks for reporting the issue, I was able to replicate the issue that you are facing. This appears to be a contribution from an outside collaborator so I wasn't able to get much information from the internal team, but it does seem to be related to the dimensions potentially reaching a maximum. For that second example, although the dimensions surpass the maximum, it still appears to fail albeit with a different error message.
Is there a particular use case you are trying to use this for?
Hi @GeezYuven,嗨@GeezYuven,
Thanks for reporting the issue, I was able to replicate the issue that you are facing. This appears to be a contribution from an outside collaborator so I wasn't able to get much information from the internal team, but it does seem to be related to the dimensions potentially reaching a maximum. For that second example, although the dimensions surpass the maximum, it still appears to fail albeit with a different error message.感谢您报告此问题,我能够复制您面临的问题。这似乎是来自外部合作者的贡献,所以我无法从内部团队获得太多信息,但它似乎确实与可能达到最大值的维度有关。对于第二个示例,尽管维度超过了最大值,但它仍然显示失败,尽管错误消息不同。
Is there a particular use case you are trying to use this for?有没有一个特定的用例,你试图使用它?
There are no specific use cases, and this issue occurred during my testing
Hi @GeezYuven,
Thanks for the response, the issue seems to be related to the dimension sizes so I would recommend sticking within the limits you found. I had a chat with the internal team and since this is from an outside collaborator, I don't have much additional information on these limitations. Please let me know if you have more questions, thanks!
Hi @GeezYuven,
Thanks for the response, the issue seems to be related to the dimension sizes so I would recommend sticking within the limits you found. I had a chat with the internal team and since this is from an outside collaborator, I don't have much additional information on these limitations. Please let me know if you have more questions, thanks!
Thanks a lot