dagr
dagr copied to clipboard
CPU and memory resource detection should be Docker container aware.
Background:
Docker can put memory and cpu limits on processes running inside of containers that are a subset of all system resources. However, the current resource detection in dagr is not aware of the control groups that limit the resource usage in a container.
Because of this there is the potential for the dagr pipeline to over allocate resources which can cause out of memory container exits.
Potential Solutions
The current resource detection uses the oshi library who's recommended "fix" is to read the limits from the files on the underlying OS that contain them. https://github.com/oshi/oshi/issues/893
This requires the resource detection to have logic to determine if it is running in a container environment and where to look for those files if it is.
Alternatively, JDK 10 has docker aware resource detection that has been back-ported to JDK 8 v191+
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8196595
These features are on be default so resource querying does not need any logic to determine if it is running in a container or not.
I am +1 on this! The bug is rarely hit for me, but when it appears, the entire job comes crashing down.