[WIP] Report disk usage, fix low disk usage recovery, update EVE overhead limits to avoid running out
This PR addresses several issues related to trying to use "all" storage on the device. They include:
- reporting all of the used directory paths and zvols in /persist; that is important to track down disk usage in the field
- fix the recovery login in onboot.sh to work with ZFS and also change it to trigger at <4Gb of space left (and not 70% used as today)
- Make the EVE overhead be more dynamic (include 2Gb for EVE + configured limit for newlog) and update the EVE number if we are currently using more in /persist e.g., for netdump or whatever
Separately we should probably also make it so that the system containerd can run without needing to allocate space in /persist.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 17.51%. Comparing base (
2c5fb18) to head (04892db). Report is 72 commits behind head on master.
:exclamation: Current head 04892db differs from pull request most recent head 81f192d
Please upload reports for the commit 81f192d to get more accurate results.
Additional details and impacted files
@@ Coverage Diff @@
## master #3667 +/- ##
=======================================
Coverage 17.51% 17.51%
=======================================
Files 3 3
Lines 805 805
=======================================
Hits 141 141
Misses 629 629
Partials 35 35
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@andrewd-zededa since you've looked at some of the space calculations, how about reviewing this one? Please ignore the split into many commits - I'll squash away some of them later.
I've realized that one extra dir should be added to the monitoring: /persist/memory-monitor/output. It contains up to 100 MB of logs. I'll do it a little bit later.
@eriknordmark, I have a point to discuss, not directly related to this PR. There is a request from testers to add a command that can trigger storage clean up manually. I'm thinking about adding something similar to what we have in onboot.sh in the eve.sh script. Any idea about it?
@eriknordmark, I have a point to discuss, not directly related to this PR. There is a request from testers to add a command that can trigger storage clean up manually. I'm thinking about adding something similar to what we have in
onboot.shin theeve.shscript. Any idea about it?
Let's find a time to chat. Not clear what they would want to remove (e.g., is it removing everything in /persist?)
@OhmSpectator @rouming I've taken care of all of the comments (and sanitized the set of commits). Please review and approve as appropriate.
Also, we should presumably use a directory in /persist for the eve-info to make it easier to clean that up. I can do a separate PR for that if folks agree.
Let's find a time to chat. Not clear what they would want to remove (e.g., is it removing everything in /persist?)
@eriknordmark, as far as I understand, they want to clean up some space during the tests. Initially, they only wanted to clean up the logs. I recommended removing the contents of the directories mentioned in the onboot.sh script, and it was fine for them. So, if we add it as an eve command, it could help.
Yep, we can discuss it briefly during the call on Thursday (the Community Call) or on Friday (we have a 1-to-1 call scheduled, as I remember).
We do have some regression here:
Smoke tests with tpm=true fail here:
> test eden.lim.test -test.v -timewait 5m -test.run TestInfo -out InfoContent.dinfo.HSMStatus 'InfoContent.dinfo.HSMStatus:ENABLED'
HSMStatus continues being published as DISABLED even though we run with (sw)tpm.
(you can see what eve published inside the artifact in the file info.log)
Edit: I see this failed already here: https://github.com/lf-edge/eve/pull/4115 @shjala Can you please investigate?