Edgar Gabriel

Results 138 comments of Edgar Gabriel

Ok, I can confirm that inside of the docker image I can reproduce the issue with the fcoll/vulcan component. Using the other fcoll components (i.e. individual, dynamic_gen2) produces the correct...

I know what is triggering the issue. I just need to decide whether an if-statement in the code is erroneous or whether I need to add some locking protection around...

luckily the commit message from 5 years ago was helpful, the if-statement is correct in that it does what it was supposed to do.

yes, it could, but it depends on the file system how likely it is. I will have a fix ready either later today or tomorrow, and I will backport it...

@tpadioleau I filed a pr that fixes the issue. I spent quite some time thinking about the issue and the various options, I am 99% sure that real application scenario...

I haven't tested with this version of romio, but the --with-lustre flag was not propagated to ROMIO historically (neither did the --with-pvfs2 or other file system information). We always had...

@jinz2014 this is most likely a system setup / permission issue on your side, since UCX 1.15 has been used extensively with numerous application on MI100. Can you please check...

Could you please provide the full command line that you used? I see that the put_zcopy protocol is being utilized, which is not the default with 1.15, it should be...

So just for a test, could you change the command line to the following: ``` $HOME/ompi_for_gpu/ompi/bin/mpirun -x UCX_RNDV_SCHEME=get_zcopy -n 2 ./main ``` to see whether it makes a difference?

Hm. Ok, I will see whether I can reproduce the issue locally. Are there instructions on how to compile the testcode on the github repo?