orion icon indicating copy to clipboard operation
orion copied to clipboard

Add FAQ to documentation

Open bouthilx opened this issue 2 years ago • 1 comments

We should list here frequent questions and eventually add a FAQ in the documentation

  • FailedUpdate followed by RuntimeError RuntimeError: Reservation for trial {trial.id} has been lost. This is likely due to either the process for heartbeat crashing, needs to look up in logs to see if it happened, because the CPU load was so high that the heartbeat process got stuck longer than 2 times the heartbeat value (120 seconds by default), or because the storage was too slow (ex: slow PickledDB or overloaded mongoDB). For MongoDB, if the issue occurred because the pacemaker failed to connect to the DB, this can be improved by increasing the connectTimeoutMS, which is 20 seconds by default.

bouthilx avatar Nov 30 '21 16:11 bouthilx

In case an algorithm lock is lost, the user will get orion.storage.base.LockAcquisitionTimeout. We should have a section in the FAQ to point to orion db release in such case.

bouthilx avatar Sep 07 '22 19:09 bouthilx