lightning icon indicating copy to clipboard operation
lightning copied to clipboard

pyln-testing: TailableProc leaking file descriptors

Open s373nZ opened this issue 1 year ago • 3 comments

Issue and Steps to Reproduce

Running pytest locally leaks file descriptors for TailableProc stdout, read/write handles opened in the __init__() constructor. This resulted in sporadic failures and was effectively addressed by raising the ulimit. Current testing strategy is:

  1. make pytest
  2. In a 2nd terminal window get the process idea with ps aux | grep pytest
  3. In a 3rd terminal window, observe the number of opened files grow with watch -n 1 'lsof -p ${PID} | wc
  4. In a 4th terminal window, periodically run lsof -p ${PID} | less +G to see the file descriptors grow for each test.

lsof reports four file descriptors hanging around for each test which are initialized in the TailableProc constructor here.

getinfo output

N/A

s373nZ avatar Mar 07 '24 11:03 s373nZ

So the symptom appears to be a missing teardown of some fixtures in case of an error? I think we might be able to make things more flexible by using finalizers from pytest: https://docs.pytest.org/en/6.2.x/fixture.html#adding-finalizers-directly

They are essentially callbacks that are guaranteed to be called during teardown. That could be used as a last-ditch attempt to close the FDs even though the node_factory fixture was not torn down completely due to a cleanup error.

cdecker avatar Mar 07 '24 13:03 cdecker

So the symptom appears to be a missing teardown of some fixtures in case of an error?

Actually, the FDs stick around even if the test passed successfully. lsof reports 4 FDs for each test (in a deleted state), but they hang out while the entire test suite runs synchronously. When running in parallel using PYTEST_PAR, they clean themselves up.

I will review the finalizer documentation and try some experiments.

edit: If it was unclear, the subset of local test failures reported in the issue were against my initial changes which closed the files in the TailableProc::stop() function, not the current master branch.

s373nZ avatar Mar 07 '24 13:03 s373nZ

Updated issue description for clarity by removing references to earlier prototype code cross-referenced in https://github.com/ElementsProject/lightning/pull/7108#issuecomment-1981399218 and describing test results.

s373nZ avatar Mar 08 '24 09:03 s373nZ