yapapi
yapapi copied to clipboard
yagna: "Allocation [...] not found. Release failed." -> yapapi: "Error when shutting down Golem engine: (404)"
Description
OS: ubuntu 18.04 yagna daemon version: 0.10.0 Python version: 3.6.9 yapapi library version: 0.9.0
Description of the issue: Computation fails with
[2022-04-05T15:38:09.425+0000 INFO yapapi.executor] All jobs have finished
[2022-04-05T15:38:09.425+0000 INFO yapapi.executor] 135 agreements still unpaid, waiting for invoices...
[2022-04-05T15:38:39.427+0000 WARNING yapapi.executor] Unpaid agreements: {'adb905298e386e8445a7c882a5e762f46da0cbf7f905107b1af74e620ea41511', '349e51d9fd9228f9a3834cb61539087d23c6d49882398eeffbf69b5b85f63c97', '384ca0392b4282be3e92bd6faeed41707d624036cfea9645ef302623101
[2022-04-05T15:38:39.428+0000 INFO yapapi.executor] Waiting for Golem services to finish...
[2022-04-05T15:38:39.467+0000 INFO yapapi.summary] Total cost: 0
[2022-04-05T15:38:39.468+0000 ERROR yapapi.summary] Error when shutting down Golem engine: (404)
Reason: Not Found
HTTP response headers: <CIMultiDictProxy('Content-Length': '16', 'Content-Type': 'application/json', 'Date': 'Tue, 05 Apr 2022 15:38:39 GMT')>
HTTP response body: {"message":null}
yapapi version: 0.9.0
Using subnet: public-beta, payment driver: erc20, and network: polygon
Task computed: Task(id=8, data=TaskData(i=7, output_dir=PosixPath('[...]'), task_time=datetime.timedelta(0, 3600))), time: 1:00:02.270346
Task computed: Task(id=2, data=TaskData(i=1, output_dir=PosixPath('[...]'), task_time=datetime.timedelta(0, 3600))), time: 1:00:02.100750
tasks computed: 2, total time: 2:14:08.739454
Traceback (most recent call last):
File "drone/run.py", line 243, in <module>
main()
File "drone/run.py", line 223, in main
loop.run_until_complete(task)
File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "drone/run.py", line 188, in async_main
print(f"tasks computed: {num_tasks_computed}, "
File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/yapapi/golem.py", line 291, in __aexit__
return await self._stop_with_exc_info(*exc_info)
File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/yapapi/golem.py", line 295, in _stop_with_exc_info
res = await self._engine.stop(*exc_info)
File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/yapapi/engine.py", line 222, in stop
return await self._stack.__aexit__(None, None, None)
File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/async_exit_stack/_async_exit_stack.py", line 208, in __aexit__
raise exc_details[1]
File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/async_exit_stack/_async_exit_stack.py", line 191, in __aexit__
cb_suppress = await cb(*exc_details)
File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/yapapi/rest/payment.py", line 111, in __aexit__
await self._api.release_allocation(self._id)
File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/ya_payment/api_client.py", line 206, in __call_api
raise e
File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/ya_payment/api_client.py", line 202, in __call_api
_request_timeout=_request_timeout,
File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/ya_payment/rest.py", line 255, in DELETE
body=body,
File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/ya_payment/rest.py", line 180, in request
raise ApiException(http_resp=r)
ya_payment.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: <CIMultiDictProxy('Content-Length': '16', 'Content-Type': 'application/json', 'Date': 'Tue, 05 Apr 2022 15:38:39 GMT')>
HTTP response body: {"message":null}
At the same time, yagna says
[2022-04-05T15:38:39.460+0000 WARN ya_payment::dao::allocation] Allocation [...] not found. Release failed.
Actual result: I already wrote it.
Steps To Reproduce
Run the example drone.py, but with long (1h) tasks.
Expected behavior
It should not fail with tracebacks.
Logs and any additional context
When analysing logs please bear in mind that "vague log" ->
[2022-04-05T15:38:39.460+0000 WARN ya_payment::dao::allocation] Allocation 979bf31e-6361-4b53-ac88-3fa86a936abc not found. Release failed.
might be the result of allocation release due to timeout of the allocation set when it was created.
Allocation was already released when its timeout passed ->
[2022-04-05T13:54:00.631+0000 INFO ya_payment::dao::allocation] Allocation 979bf31e-6361-4b53-ac88-3fa86a936abc released.
This "vague log" issue has been detected and resolved here https://github.com/golemfactory/yagna/issues/1885 (check task for more details). I hope the implemented solution will be released with the nearest patch. I don't know if this has any effect on yapapi
error which you described above.
so, most likely a matter of catching the relevant 404 and just maybe issuing a warning to the log...
plus, it might warrant an integration test...
We are going to change error code to HTTP Gone (410) in case allocation was already released https://github.com/golemfactory/yagna/issues/1931
Fix: https://github.com/golemfactory/yapapi/pull/925 @nieznanysprawiciel This fix ignores both 404 and 410.
PR that fixes allocation timeout: https://github.com/golemfactory/yapapi/pull/926
PR that fixes allocation timeout: #926
it does not fix the issue, it's a work-around...
What is the proper solution here?
In regards to changes in yagna: for those already released allocations we will be returning http gone (410) for any subsequent get/release on them.
What is the proper solution here?
yapapi
should create a new allocation after yagna
releases the old allocation (if the budget was not spent). This functionality can be used to add spending limits, i.e. instead of total budget for computation, the user can set budget per hour.