aim
aim copied to clipboard
Flap on initing Run
🐛 Bug
I have a very strange bug, sometimes everything is ok, sometimes i got his:
AttributeError: 'str' object has no attribute 'class_'
return "<%s at 0x%x>" % (state.class_.__name__, id(state.obj()))
"row is otherwise not present." % base.state_str(state)
File "/usr/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/base.py", line 264, in state_str
File "/usr/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/exc.py", line 138, in __init__
raise exception(*args) if args else exception()
File "/usr/python3.8/lib/python3.8/site-packages/aim/ext/transport/message_utils.py", line 76, in raise_exception
raise_exception(response.exception)
File "/usr/python3.8/lib/python3.8/site-packages/aim/ext/transport/client.py", line 225, in get_resource_handler
handler = self._rpc_client.get_resource_handler(self, self.resource_type, args=self.init_args)
File "/usr/python3.8/lib/python3.8/site-packages/aim/storage/structured/proxy.py", line 28, in __init__
return StructuredRunProxy(self._client, hash_, read_only)
File "/usr/python3.8/lib/python3.8/site-packages/aim/sdk/repo.py", line 351, in request_props
self._props = self.repo.request_props(self.hash, self.read_only)
File "/usr/python3.8/lib/python3.8/site-packages/aim/sdk/run.py", line 440, in props
self.props
File "/usr/python3.8/lib/python3.8/site-packages/aim/sdk/run.py", line 325, in __init__
super().__init__(run_hash, repo=repo, read_only=read_only, experiment=experiment, force_resume=force_resume)
File "/usr/python3.8/lib/python3.8/site-packages/aim/sdk/run.py", line 828, in __init__
return func(*args, **kwargs)
File "/usr/python3.8/lib/python3.8/site-packages/aim/ext/exception_resistant.py", line 68, in wrapper
raise e
File "/usr/python3.8/lib/python3.8/site-packages/aim/ext/exception_resistant.py", line 47, in reraise_exception
_SafeModeConfig.exception_callback(e, func)
File "/usr/python3.8/lib/python3.8/site-packages/aim/ext/exception_resistant.py", line 70, in wrapper
run = Run(
File "run.py", line 71, in _init_aim
self.run = self._init_aim()
To reproduce
I initialise run like that:
run = Run(
repo="aim://127.0.0.1:53800",
experiment=project_name,
)
Expected behavior
No error or another :)
Environment
- Aim Version 3.16.2
- 3.8
- 3.8
- Ubuntu 20.04
Hey @Alexponomarev7! Thanks a lot for the report. Can I ask you which version of sqlalchemy
are you using on server side?
And also, is your server-side aim
on 3.16.2
version?
@mihran113 Thank you for fast response!
>>> sqlalchemy.__version__
'1.4.46'
~ $ docker run --network host --entrypoint aim aimstack/aim version
Aim v3.16.2
Seems you're running aim server
on docker? Could you please provide some more details about your setup, so I can reproduce it on my end?
And one more question:
did this happen starting from 3.16.2
version or was it happening earlier as well?
Yes, sorry about missing this in the environment information. We haven't tried earlier versions, because now we try to move our infra from old one to Aim. I can only describe our enviroment as i use docker to up UI and server. They use custom repo path, I guess it doesn't matter but something like /home/aim
. For last hour I made about 10 our runs with simple training, all runs are the same. 2 of 10 have completed with the error that i described here
Hmm, pretty strange, I'll try to reproduce it on my end, doesn't seem to be something obvious or setup related. Will ping you once any updates.
@mihran113 Hi! We have been using AIM for 2 days and we've got more context which happens with such problem:
On client side we got this:
sqlalchemy.exc.InvalidRequestError: This session is in 'prepared' state; no further SQL can be emitted within this transaction.
On backend side we got this:
It's not about initing, seems that it's about track using, but it's also a problem about sqlalchemy It may be connected
I'm experiencing the same issue; I'm using the lightning adapter and this is happening both when runs fail and/or succeed!
the only way I've managed to consistently overcome the issue is by
pip uninstall sqlalchemy
pip install "sqlalchemy<2.0.0" -U
If I had to guess, this could be an sqlalchemy caching issue?
Hey @Alexponomarev7! I wasn't able to reproduce it on my end. If possible could you please share an example script that might help to reproduce it? The warnings on server side indicate that it might have something to do with adding tags to runs.
I haven't been able to create a simple example script but I'm running into the same issue with sqlalchemy 1.4.39 when adding tags to a run immediately after creation.
run = Run(
repo=repo_path,
experiment=experiment,
**self._aim_run_kwargs,
)
for t in self._tags:
run.add_tag(t)```
However, it doesn't seem to occur every time and I haven't been able to determine the exact conditions when it does. If I do I'll post the update there.
Here is the relevant portion of the stacktrace:
Traceback (most recent call last):
File "/py_env/quickstart/lib/python3.11/site-packages/ray/tune/execution/tune_controller.py", line 853, in _on_result
on_result(trial, *args, **kwargs)
File "/py_env/quickstart/lib/python3.11/site-packages/ray/tune/execution/tune_controller.py", line 1187, in _on_trial_reset
self._actor_started(tracked_actor, log="REUSED")
File "/py_env/quickstart/lib/python3.11/site-packages/ray/tune/execution/tune_controller.py", line 764, in _actor_started
self._callbacks.on_trial_start(
File "/py_env/quickstart/lib/python3.11/site-packages/ray/tune/callback.py", line 384, in on_trial_start
callback.on_trial_start(**info)
File "/py_env/quickstart/lib/python3.11/site-packages/ray/tune/logger/logger.py", line 145, in on_trial_start
self.log_trial_start(trial)
File "/py_env/quickstart/lib/python3.11/site-packages/pln/fitting/tune_callbacks.py", line 96, in log_trial_start
self._trial_to_run[trial] = self._create_run(trial)
^^^^^^^^^^^^^^^^^^^^^^^
File "/py_env/quickstart/lib/python3.11/site-packages/pln/fitting/tune_callbacks.py", line 75, in _create_run
run.add_tag(t)
File "/py_env/quickstart/lib/python3.11/site-packages/aim/sdk/run.py", line 254, in add_tag
return self.props.add_tag(value)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/py_env/quickstart/lib/python3.11/site-packages/aim/storage/structured/proxy.py", line 86, in add_tag
return self._rpc_client.run_instruction(self._hash, self._handler, 'add_tag', (value,))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/py_env/quickstart/lib/python3.11/site-packages/aim/ext/transport/client.py", line 260, in run_instruction
return self._run_read_instructions(queue_id, resource, method, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/py_env/quickstart/lib/python3.11/site-packages/aim/ext/transport/client.py", line 285, in _run_read_instructions
raise_exception(status_msg.header.exception)
File "/py_env/quickstart/lib/python3.11/site-packages/aim/ext/transport/message_utils.py", line 76, in raise_exception
raise exception(*args) if args else exception()
sqlalchemy.exc.InvalidRequestError: This session is in 'prepared' state; no further SQL can be emitted within this transaction.