eve [WIP] Report disk usage, fix low disk usage recovery, update EVE overhead limits to avoid running out

This PR addresses several issues related to trying to use "all" storage on the device. They include:

reporting all of the used directory paths and zvols in /persist; that is important to track down disk usage in the field
fix the recovery login in onboot.sh to work with ZFS and also change it to trigger at <4Gb of space left (and not 70% used as today)
Make the EVE overhead be more dynamic (include 2Gb for EVE + configured limit for newlog) and update the EVE number if we are currently using more in /persist e.g., for netdump or whatever

Separately we should probably also make it so that the system containerd can run without needing to allocate space in /persist.

Dec 13 '23 21:12 eriknordmark

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 17.51%. Comparing base (2c5fb18) to head (04892db). Report is 72 commits behind head on master.

:exclamation: Current head 04892db differs from pull request most recent head 81f192d

Please upload reports for the commit 81f192d to get more accurate results.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3667   +/-   ##
=======================================
  Coverage   17.51%   17.51%           
=======================================
  Files           3        3           
  Lines         805      805           
=======================================
  Hits          141      141           
  Misses        629      629           
  Partials       35       35

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Dec 13 '23 21:12 codecov[bot]

@andrewd-zededa since you've looked at some of the space calculations, how about reviewing this one? Please ignore the split into many commits - I'll squash away some of them later.

Apr 19 '24 21:04 eriknordmark

I've realized that one extra dir should be added to the monitoring: /persist/memory-monitor/output. It contains up to 100 MB of logs. I'll do it a little bit later.

Jul 22 '24 10:07 OhmSpectator

@eriknordmark, I have a point to discuss, not directly related to this PR. There is a request from testers to add a command that can trigger storage clean up manually. I'm thinking about adding something similar to what we have in onboot.sh in the eve.sh script. Any idea about it?

Jul 29 '24 11:07 OhmSpectator

@eriknordmark, I have a point to discuss, not directly related to this PR. There is a request from testers to add a command that can trigger storage clean up manually. I'm thinking about adding something similar to what we have in onboot.sh in the eve.sh script. Any idea about it?

Let's find a time to chat. Not clear what they would want to remove (e.g., is it removing everything in /persist?)

Jul 31 '24 20:07 eriknordmark

@OhmSpectator @rouming I've taken care of all of the comments (and sanitized the set of commits). Please review and approve as appropriate.

Also, we should presumably use a directory in /persist for the eve-info to make it easier to clean that up. I can do a separate PR for that if folks agree.

Jul 31 '24 23:07 eriknordmark

Let's find a time to chat. Not clear what they would want to remove (e.g., is it removing everything in /persist?)

@eriknordmark, as far as I understand, they want to clean up some space during the tests. Initially, they only wanted to clean up the logs. I recommended removing the contents of the directories mentioned in the onboot.sh script, and it was fine for them. So, if we add it as an eve command, it could help.

Yep, we can discuss it briefly during the call on Thursday (the Community Call) or on Friday (we have a 1-to-1 call scheduled, as I remember).

Aug 01 '24 12:08 OhmSpectator

We do have some regression here: Smoke tests with tpm=true fail here:

> test eden.lim.test -test.v -timewait 5m -test.run TestInfo -out InfoContent.dinfo.HSMStatus 'InfoContent.dinfo.HSMStatus:ENABLED'

HSMStatus continues being published as DISABLED even though we run with (sw)tpm. (you can see what eve published inside the artifact in the file info.log)

Edit: I see this failed already here: https://github.com/lf-edge/eve/pull/4115 @shjala Can you please investigate?

Aug 02 '24 11:08 milan-zededa