nichamon

Results 35 comments of nichamon

@baallan What was the symptom that indicated the race? Did you see an aggregator that received a DIR request before the sampler sent the DIR_ADD? Both `dir_update()` and `process_dir_request()` sends...

Here is an example of ldms_ls' s output with the new Slurm set. The set contains two lists: job_list and task_list. ![image](https://user-images.githubusercontent.com/22683157/182291982-e35ca670-ec67-4560-b5b1-1445f407d894.png)

> @nichamon, is this still a draft or is it ready to merge? It was tested. It is ready to be merged. The test script accompanying the patch is here....

@tom95858 @baallan I found out why L1 does not clean up the set and keeps looking it up. The timeline is as follows. L0: create set A L0: publish set...

@tom95858 I told you on the phone that the bug has some things to do with a missing RBD. I was wrong. This has nothing to do with the existence...

> @baallan, @nichamon why are we doing this exactly? What is wrong with the expanded auth configuration supported in the configuration files. @tom95858 Yes, the configuration commands in the configuration...

@baallan I cannot reproduce the error. All daemons exited gracefully. I looked at ldmstest/many/run/revconf.1 `prdcr_del` is sent right after `prdcr_stop` to the aggregator. A possibility that resulted in the "prdcr...

@baallan The procstat fix is unrelated to the in-use producer message. However, without the fix, your test cannot run successfully. As I mentioned, it is possible that users will see...

@baallan Thanks for the info! I assume that no L2 aggregated from L1.

@tom95858 > Also for those same nodes, the messages that the pids are gone do not get delivered to the L1, even though all such messages are published at L0...