nichamon
nichamon
@baallan What was the symptom that indicated the race? Did you see an aggregator that received a DIR request before the sampler sent the DIR_ADD? Both `dir_update()` and `process_dir_request()` sends...
Here is an example of ldms_ls' s output with the new Slurm set. The set contains two lists: job_list and task_list. 
> @nichamon, is this still a draft or is it ready to merge? It was tested. It is ready to be merged. The test script accompanying the patch is here....
@tom95858 @baallan I found out why L1 does not clean up the set and keeps looking it up. The timeline is as follows. L0: create set A L0: publish set...
@tom95858 I told you on the phone that the bug has some things to do with a missing RBD. I was wrong. This has nothing to do with the existence...
> @baallan, @nichamon why are we doing this exactly? What is wrong with the expanded auth configuration supported in the configuration files. @tom95858 Yes, the configuration commands in the configuration...
@baallan I cannot reproduce the error. All daemons exited gracefully. I looked at ldmstest/many/run/revconf.1 `prdcr_del` is sent right after `prdcr_stop` to the aggregator. A possibility that resulted in the "prdcr...
@baallan The procstat fix is unrelated to the in-use producer message. However, without the fix, your test cannot run successfully. As I mentioned, it is possible that users will see...
@baallan Thanks for the info! I assume that no L2 aggregated from L1.
@tom95858 > Also for those same nodes, the messages that the pids are gone do not get delivered to the L1, even though all such messages are published at L0...