[Docs][Core] Add head-node memory growth and OOM guidance
Description
This PR adds a new documentation page, Head Node Memory Management, under the Ray Core advanced topics section.
Related issues
Closes #58621
Additional information
@israbbani Thanks for the review. Here’s a quick summary of what’s been updated.
I added a short explanation of the Ray Dashboard along with a link to the official docs, and included a permalink to the event caching implementation. The “Why Head Node Memory Grows” section has been rewritten into a simple bullet list, and the old redundant subsections underneath it have been removed.
As suggested, the entire “Metrics and Reporting Overhead” section has been removed, and the “Enable Resource Isolation” section (along with all related mentions in Best Practices and Troubleshooting) has also been taken out.
I added a dedicated subsection explaining why tasks and actors shouldn’t be scheduled on the head node, along with a reference to the large-cluster head-node configuration guide. The dashboard disable section was also simplified for clarity.
Lastly, I added a link to the official ray memory troubleshooting guide and updated the description accordingly.
Let me know if you’d like any further adjustments!
@nadongjun thanks for the update. I'll take another look tomorrow for review.
This pull request has been automatically marked as stale because it has not had any activity for 14 days. It will be closed in another 14 days if no further activity occurs. Thank you for your contributions.
You can always ask for help on our discussion forum or Ray's public slack channel.
If you'd like to keep this open, just leave any comment, and the stale label will be removed.