Add squeue-based Slurm data collector
Summary
This PR adds a simple data collector to extract information about the queue in slurm. It is based on parsing the output of squeue, and currently only extracts the number of running and pending jobs in the queue.
Test Plan
I am not yet sure how to test this change. Basically we would need:
- a running slurm (client) providing the
squeuecommand - a slurm control daemon which provides the contents for
squeue
Additional Information
We use netdata to monitor several machines running a compute cluster managed by Slurm. It would thus be nice to also see the utilization of the overall system in a dashboard.
I am not familiar with the structure of netdata, and have oriented myself at the postfix example in python.d. Let me know if there is anything missing / could be solved in a better way.
Servus Max! 🥨 Thank you for contributing to Netdata! I had a look through the documentation and made some suggestions. Kind regards, Tina, Technical Writer at Netdata
@kickoke thanks for the review!
I further updated the README, and reduced some of the boilerplate configuration code I had copied from the postfix example.
Made some further structural improvements to the documentation. @ilyam8 You might want to take a look to ensure what I've written is still correct in this case.
Had a final look and I am happy docs-wise. Thank you for your contribution @mberr 🏅
@ilyam8 Do you want to take another look?
I know we've been busy with the new release in the last 2 weeks. Is this PR now ready to merge?
Is there anything open from my side?
@mberr Docs-wise, we are good to go. @DShreve2 Can you ping someone from engineering?