Narate Taerat

Results 27 comments of Narate Taerat

The change is part of https://github.com/ovis-hpc/ovis/commit/e74153b165917bd8abfc3bb6d34deefafd6d10c4 in v5 (master). Let me port it overs so that `fabric` transport has a way to specify provider and domain.

@baallan I'd just pushed `v4-ldms-fabric` to my repo on gitlab (https://gitlab.opengridcomputing.com/narate/ovis/-/commits/v4-ldms-fabric). Could you please give it a try? Thx

@baallan I'd just added a fix to the buffer overrun problem you've found && rebase on top of OVIS-4 && (forced) pushed to my repo on gitlab on the same...

@baallan Unfortunately, no updates yet. However, I had just thought of something. I'm guessing that your test uses '-x' option, and not `listen`, command. So, could you please give a...

@tom95858 Ben tested my branch `narate/v4-ldms-fabric`. In short, ldms over fabric works when listen on a specific address (using `listen xprt=fabric port=BLA host=OMNIPATH_IP_ADDR). During the session, we noticed a couple...

@eric-roman, I'd like to try to reproduce it, but I need more info: - What is a Slurm's job container? Is it a feature in Slurm that I need to...

@eric-roman Thanks! I'll look into it. Last question: What is the version of Slurm you're using?

@tom95858 @eric-roman FYI, The error is confirmed. I've tested with slurm 20.11.8 and OVIS-4.3.7 on OGC cygnus cluster. ``` [salloc] (git-branch: --) narate@cygnus-08 ~/slurm $ srun bash srun: error: cygnus-01:...

@eric-roman I think you're right. The multi-threaded is the cause. According to `setns(2)` man page: > A multithreaded process may not change user namespace with setns(). I also verified this...