Ioannis Assiouras

Results 32 comments of Ioannis Assiouras

I can reproduce the hipErrorInvalidValue error with 5.6 as well. It looks like it comes from a check in hipGraphAddKernelNode for kernelParams and extra arguments not be set to nullptr...

I have created an internal PR to fix this, kernelParams and extra arguments should be allowed to be both set to nullptr if the kernel does not expect any arguments...

@kotee4ko I am sorry for the delay. I think it would help if we can get a debug build and from the backtrace of that see exactly where it fails...

@kotee4ko could you please check that the call to dev->createMemory() https://github.com/ROCm-Developer-Tools/clr/blob/develop/rocclr/platform/memory.cpp#L339 returns a valid pointer in all cases. I think it could also help to `export AMD_LOG_LEVEL=4` and then re-run...

@kotee4ko Yes sorry that was just a reference to the dev->createMemory call in addDeviceMemory, please continue to use the 5.7.x branches of hip/clr as the develop branch is ahead and...

Hi @kotee4ko so it looks like we can confirm that dev->createMemory returns a nullptr at some point. I do not have access to a multi-gpu vega 10 yet, I am...

@kotee4ko I think its worth looking into NumDevicesWithP2P() and why its reports 1 in you case, as you have two physical devices I think this should have returned 2 unless...

Hi @kotee4ko the reason why DtoD works is because it follows a different path that ends-up performing a 2-step transfer with staging buffer. It gets here https://github.com/ROCm-Developer-Tools/clr/blob/rocm-5.7.x/rocclr/device/rocm/rocvirtual.cpp#L2016 and I think...

Thank you for sharing @kotee4ko, I am glad it works. Let me take this internally and see how it can be consolidated into a permanent fix. There may be some...

I think that the dependence on the number of created streams (even if you do not use any of these) can be explained by the fact that during hipEventRecord(events[i] ,...