ray icon indicating copy to clipboard operation
ray copied to clipboard

[Docs][Core] Add head-node memory growth and OOM guidance

Open nadongjun opened this issue 1 month ago • 3 comments

Description

This PR adds a new documentation page, Head Node Memory Management, under the Ray Core advanced topics section.

Related issues

Closes #58621

Additional information

image image

nadongjun avatar Nov 17 '25 05:11 nadongjun

@israbbani Thanks for the review. Here’s a quick summary of what’s been updated.

I added a short explanation of the Ray Dashboard along with a link to the official docs, and included a permalink to the event caching implementation. The “Why Head Node Memory Grows” section has been rewritten into a simple bullet list, and the old redundant subsections underneath it have been removed.

As suggested, the entire “Metrics and Reporting Overhead” section has been removed, and the “Enable Resource Isolation” section (along with all related mentions in Best Practices and Troubleshooting) has also been taken out.

I added a dedicated subsection explaining why tasks and actors shouldn’t be scheduled on the head node, along with a reference to the large-cluster head-node configuration guide. The dashboard disable section was also simplified for clarity.

Lastly, I added a link to the official ray memory troubleshooting guide and updated the description accordingly.

Let me know if you’d like any further adjustments!

nadongjun avatar Nov 21 '25 01:11 nadongjun

@nadongjun thanks for the update. I'll take another look tomorrow for review.

israbbani avatar Nov 25 '25 23:11 israbbani

This pull request has been automatically marked as stale because it has not had any activity for 14 days. It will be closed in another 14 days if no further activity occurs. Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

github-actions[bot] avatar Dec 10 '25 00:12 github-actions[bot]