Jonathan Karlsen

Results 62 comments of Jonathan Karlsen

@xjules I am closing this as it has not been observed for a while.

Resolve conflicts, squash commits, then we are ready to roll! :ship:

> Since this out-of-memory happens during lsf_driver.poll() (happening every 2 seconds), I think it makes sense to keep calm and carry on, that is ignore the OSError and either let...

> I guess I might close this one then: #8867 But there I got another error though despise the same symptoms. > > ``` > Couldn't establish connection with the...

I managed to reproduce this locally and these are some of the logs: ``` ERROR ert.scheduler.job:job.py:263 Realization: 34 failed after reaching max submit (1): ERROR ert.scheduler.job:job.py:315 job poly_eval failed with:...

The logs make sense as the poly_eval.py here should quit with exitcode 1, and the test is for exactly this `test_that_update_works_with_failed_realizations`. ``` #!/usr/bin/env python import numpy as np import sys...

It seems like it could be the verify_checksum method in job.py holding onto the forward_model_ok lock for two minutes if it cannot find the checksum file. Will investigate.

``` 2024-10-25 14:14:09,350 - ert.plugins.plugin_manager - MainThread - DEBUG - ERT Plugin manager: ert.plugins.hook_implementations 2024-10-25 14:14:09,350 - ert.plugins.plugin_manager - MainThread - DEBUG - Creating temporary directory for site-config 2024-10-25 14:14:09,351...

The last thing we get is `Ensemble ran with maximum memory usage for a single realization job` from the `stopped_handler`, then the monitor times out after 2 minutes (which it...