docs
docs copied to clipboard
Production Checklist: Expand NUMA best practices
Lauren Singh (lnhsingh) commented:
For hypervisor configurations that have more than one socket, NUMA best practices should be followed for optimal memory performance.
@drewdeally: This can be more complex as if 2 numa nodes appear in an OS, which is possible on large containers on cloud platforms, we would need to test it. What would be the best practice for 2 numa nodes in one container, 1 CR or 2 CR numa bind to each node. This would need to be tested. I will remove it for now and we can look at this in a future doc.
Related to #4153
Jira Issue: DOC-220
linville (mdlinville) commented: I don’t fully understand the request for the Docs team here. Does someone have the guidance that we should provide?
shannonb (shannonbradshaw) commented: Michael Wang do we have guidance on this?
Richard Loveland (rmloveland) commented: linville I was asked to sync with Jeffrey White about this NUMA stuff via another channel before I realized this was really for you (aka Deployment & Operations per https://cockroachlabs.atlassian.net/wiki/x/RQG-dg ). Sorry to have blundered in!
tl;dr for this seems to be that we will work on NUMA machines, but the performance will not be good without some tuning
Matt and Jeff, can y’all please coordinate on what docs work is needed here?
linville (mdlinville) commented: Jeffrey White is investigating this and will draft something and then loop me in. I’ve put this in May for now just so it’s easy to find.
linville (mdlinville) commented: Jeffrey White Do you have an update on this? If we don’t have time to do an in-depth reference architecture, what is the minimum guidance? For example, make sure you pin each {{cockroach}} process to a separate NUMA domain?
Shannon Bradshaw (shannonbradshaw) commented: Jeffrey White any follow up here? This came up again today.