Darjeeling icon indicating copy to clipboard operation
Darjeeling copied to clipboard

unexpected response from server [500]: Too many open files

Open pdreiter opened this issue 5 years ago • 3 comments

On a single program that I'm running algorithm type genetic, I'm seeing the following exception occur a few times:

<title>requests.exceptions.ConnectionError: ('Connection aborted.', OSError(24, 'Too many open files')) // Werkzeug Debugger</title>

I captured the STDOUT to a debug.log, so if this is something that is of interest to debug, I can attach that log.

errata: I have a large population size and number of generations that probably contributes to the expression of this error:

algorithm:
  type: genetic
  population: 200
  generations: 200
  tournament-size: 20
  mutation-rate: 0.8
  crossover-rate: 0.4
  # look at entire test suite for test sampling [subset of testsuite is 100%]
  test-sample-size: null

Looks like this ConnectionError ultimately results in the darjeeling process to hang. I'm guessing that it's waiting for evaluation data from the lost Candidates.


submitted from GitQ

pdreiter avatar Jun 26 '19 19:06 pdreiter

Attached is the STDOUT from the darjeeling run [large file, gzip'd] anonymized.debug.log.gz

pdreiter avatar Jun 26 '19 19:06 pdreiter

Follow-up : reducing the population size to 50 from 200 did not prevent the ConnectionError from occurring, but did prolong the time before the Error occurred.

pdreiter avatar Jun 27 '19 19:06 pdreiter

This sounds like a resource leak related to the requests API calls that are used to talk to the BugZoo server. From reading a few similar issue reports for requests, it sounds like it may be necessary to explicitly close requests.

I'll put together a potential fix now.

ChrisTimperley avatar Jun 28 '19 00:06 ChrisTimperley