flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

content/content-sqlite: what to do on ENOSPC

Open chu11 opened this issue 5 months ago • 18 comments

This continues conversation from PR #6217. From @garlick

After consulting with @kkier a bit on this, we think it may be better to present admins with a choice of

  • ensure flux never runs out of space on rank 0 (whether by partition or whatever), or
  • flux goes down hard on ENOSPC and, when manually restarted, recovers what it can but doesn't stop for manual intervention

That is a trade-off they are most qualified to make based on their judgement of the relative impacts. From their perspective I think it's more of a question of whether they want to reserve some amount of disk for flux or share it among multiple consumers, rather than whether or not they want to make a partition. If this PR successfully provided a mechanism to reserve space without a partition, it wouldn't change much in that calculus.

IOW: I think we should invest some development effort in the second option for now rather than this space reservation scheme.

and later

Well it does seem like content-sqlite should just close the database after we get an ENOSPC. I think we probably need to talk through what can happen next. Ideally it minimizes the need to manually intervene.

chu11 avatar Aug 26 '24 22:08 chu11