Edgar Gabriel

Results 137 comments of Edgar Gabriel

@denisbertini thank you for the report. What GPU is this if I may ask? This looks like an environment setup thing, something like the GPUdirect kernel component, or a BIOS...

@denisbertini is there an easy way to run a simple test that reproduces the issue?

What is your iommu setting if I may ask? And one other thing that comes to my mind is whether acs is disabled?

for the first one, try cat /proc/cmdline | grep iommu

@denisbertini what about the PCI ACS. is that also disabled on the nodes? I am 99% confident that this is a system setup issue, not a UCX issue since we...

Do you have a script/recipe that I could use to reproduce the run on one of our internal systems? e.g. how to compile and run the code, what input files...

what mofed version are you running btw. on that system?

> we do not use mellanox official MOFED but the linux rdma-core library could you in that case check whether the ib_peer_mem kernel module is running/used?

@denisbertini thank you! I will give it a try, but it might take me a few days until I get to it.

@denisbertini I have successfully compiled the application, but I have trouble running it. The system here does not have singularity installed, but independent of that I get the following error...