BlueOS icon indicating copy to clipboard operation
BlueOS copied to clipboard

bug: CPU full of lsof

Open rotu opened this issue 3 months ago • 8 comments

Bug description

It seems BlueOS is overwhelming the CPU with repeated abuse of lsof.

Image Image

Steps to reproduce

Run BlueOS core 1a44a7a23907 Ssh into the machine and run top or look at the Processes tab in System Information.

Primary pain point(s)

No response

Additional context

No response

Prerequisites

  • [x] I have checked to make sure that a similar request has not already been filed or fixed.

rotu avatar Sep 23 '25 17:09 rotu

I think lsof is only used to check if a file is open before deleting it in https://github.com/bluerobotics/BlueOS/blob/3de218531003b5842d0e9dec5128252243f3a025/core/libs/commonwealth/src/commonwealth/utils/general.py#L157

It's unclear why this check is necessary, and maybe a call to shutil.rmtree would suffice instead.

rotu avatar Sep 23 '25 17:09 rotu

Right, this was necessary because if we delete files that are still open by the log writers, the writer keeps writing to an invalid node without knowing, maybe only recovering when the rotation is triggered.

joaoantoniocardoso avatar Sep 23 '25 18:09 joaoantoniocardoso

Right, this was necessary because if we delete files that are still open by the log writers, the writer keeps writing to an invalid node without knowing, maybe only recovering when the rotation is triggered.

Got it. I think the best approach (for now) would be to use the retention feature of loguru https://loguru.readthedocs.io/en/stable/overview.html#easier-file-logging-with-rotation-retention-compression

rotu avatar Sep 23 '25 22:09 rotu

A little research shows that fuser is a better choice than lsof for checking whether a particular file is open. https://www.man7.org/linux/man-pages/man1/fuser.1.html

rotu avatar Sep 23 '25 23:09 rotu

Right, this was necessary because if we delete files that are still open by the log writers, the writer keeps writing to an invalid node without knowing, maybe only recovering when the rotation is triggered.

Got it. I think the best approach (for now) would be to use the retention feature of loguru https://loguru.readthedocs.io/en/stable/overview.html#easier-file-logging-with-rotation-retention-compression

Yes, that could be used if the log packing-and-download code was responsibility of loguru... it's unfortunately not the same thing 😕

A little research shows that fuser is a better choice than lsof for checking whether a particular file is open. https://www.man7.org/linux/man-pages/man1/fuser.1.html

We can try that, it may work in this case, too. I don't recall specifically why lsof was the chosen tool here. Maybe @patrickelectric has some clue?

Here's more context: https://github.com/bluerobotics/BlueOS/pull/1523

joaoantoniocardoso avatar Sep 23 '25 23:09 joaoantoniocardoso

This makes BlueOS beta pretty unusable, and I'm still seeing it in 1.5.0-beta.15. Are there any settings which can make BlueOS cool it with the lsof?

rotu avatar Oct 21 '25 21:10 rotu

I just released beta.16, it should help a lot with that

patrickelectric avatar Oct 21 '25 21:10 patrickelectric

Confirmed - so far 1.5.0-beta.16 is running a lot cooler

rotu avatar Oct 22 '25 16:10 rotu