aiida-core
aiida-core copied to clipboard
extra can not be set on the fly when query the database
The behavior is changed between 2.0.3
and 2.0.4
on query. Setting the extra attributes of the node by query.iterall()
.
As reported in https://github.com/aiidateam/aiida-tutorials/issues/435.
The query for the structure data and set the extras
from aiida.orm import QueryBuilder
query = QueryBuilder()
query.append(StructureData, filters={'extras':{'!has_key':'formula'}})
for structure, in query.iterall():
structure.base.extras.set('formula', structure.get_formula(mode='count'))
Will cause the error:
09/27/2022 10:08:53 AM <2081> sqlalchemy.pool.impl.QueuePool: [ERROR] Error closing cursor
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py", line 1099, in fetchmany
new = dbapi_cursor.fetchmany(size - lb)
psycopg2.ProgrammingError: named cursor isn't valid anymore
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1995, in _safe_close_cursor
cursor.close()
psycopg2.ProgrammingError: named cursor isn't valid anymore
---------------------------------------------------------------------------
ProgrammingError Traceback (most recent call last)
/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py in fetchmany(self, result, dbapi_cursor, size)
1098 try:
-> 1099 new = dbapi_cursor.fetchmany(size - lb)
1100 except BaseException as e:
ProgrammingError: named cursor isn't valid anymore
The above exception was the direct cause of the following exception:
ProgrammingError Traceback (most recent call last)
<ipython-input-1-916cde5a3649> in <module>
134 structure.set_extra('formula', structure.get_formula(mode='count'))
135
--> 136 store_formula_in_extra()
<ipython-input-1-916cde5a3649> in store_formula_in_extra()
131 query = QueryBuilder()
132 query.append(StructureData, filters={'extras':{'!has_key':'formula'}})
--> 133 for structure, in query.iterall():
134 structure.set_extra('formula', structure.get_formula(mode='count'))
135
/opt/conda/lib/python3.9/site-packages/aiida/orm/querybuilder.py in iterall(self, batch_size)
1048 :returns: a generator of lists
1049 """
-> 1050 for item in self._impl.iterall(self.as_dict(), batch_size):
1051 # Convert to AiiDA frontend entities (if they are such)
1052 for i, item_entry in enumerate(item):
/opt/conda/lib/python3.9/site-packages/aiida/storage/psql_dos/orm/querybuilder/main.py in iterall(self, data, batch_size)
167 stmt = build.query.statement.execution_options(yield_per=batch_size)
168
--> 169 for resultrow in self.get_session().execute(stmt):
170 yield [self.to_backend(rowitem) for rowitem in resultrow]
171
/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/result.py in iterrows(self)
380
381 def iterrows(self):
--> 382 for row in self._fetchiter_impl():
383 row = make_row(row) if make_row else row
384 if post_creational_filter:
/opt/conda/lib/python3.9/site-packages/sqlalchemy/orm/loading.py in chunks(size)
140
141 if yield_per:
--> 142 fetch = cursor.fetchmany(yield_per)
143
144 if not fetch:
/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/result.py in fetchmany(self, size)
1095 """
1096
-> 1097 return self._manyrow_getter(self, size)
1098
1099 def all(self):
/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/result.py in manyrows(self, num)
540 num = real_result._yield_per
541
--> 542 rows = self._fetchmany_impl(num)
543 if make_row:
544 rows = [make_row(row) for row in rows]
/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py in _fetchmany_impl(self, size)
1806
1807 def _fetchmany_impl(self, size=None):
-> 1808 return self.cursor_strategy.fetchmany(self, self.cursor, size)
1809
1810 def _raw_row_iterator(self):
/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py in fetchmany(self, result, dbapi_cursor, size)
1099 new = dbapi_cursor.fetchmany(size - lb)
1100 except BaseException as e:
-> 1101 self.handle_exception(result, dbapi_cursor, e)
1102 else:
1103 if not new:
/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py in handle_exception(self, result, dbapi_cursor, err)
939
940 def handle_exception(self, result, dbapi_cursor, err):
--> 941 result.connection._handle_dbapi_exception(
942 err, None, None, dbapi_cursor, result.context
943 )
/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
2122 util.raise_(newraise, with_traceback=exc_info[2], from_=e)
2123 elif should_wrap:
-> 2124 util.raise_(
2125 sqlalchemy_exception, with_traceback=exc_info[2], from_=e
2126 )
/opt/conda/lib/python3.9/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***)
206
207 try:
--> 208 raise exception
209 finally:
210 # credit to
/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py in fetchmany(self, result, dbapi_cursor, size)
1097 if size > lb:
1098 try:
-> 1099 new = dbapi_cursor.fetchmany(size - lb)
1100 except BaseException as e:
1101 self.handle_exception(result, dbapi_cursor, e)
ProgrammingError: (psycopg2.ProgrammingError) named cursor isn't valid anymore
(Background on this error at: https://sqlalche.me/e/14/f405)
As discussed, this appears to be a result of #5654, whereby iterall
has an open connection, which is pulling data from the database, but then something like set_extra
is writing to the database during this, and apparently closing the transaction, so then when iterall
goes to pull more it can't.
If it worked before though, then obviously something has changed.
The logic in iterall
itself, I don't feel has changed:
![image](https://user-images.githubusercontent.com/2997570/192698736-17406939-99e2-4b81-95ba-84dfaafd2965.png)
and also use_query
to query_session
seems to have no changes 🤔
@contextmanager
def use_query(self, data: QueryDictType) -> Iterator[Query]:
"""Yield the built query."""
query = self._update_query(data)
try:
yield query
except Exception:
self.get_session().close()
raise
@contextmanager
def query_session(self, data: QueryDictType) -> Iterator[BuiltQuery]:
"""Yield the built query, ensuring the session is closed on an exception."""
query = self.get_query(data)
try:
yield query
except Exception:
self.get_session().close()
raise
The confusion is understandable. This bug was not introduced by #5654 , it has existed for all of v2.0. In fact, it even existed for 1.6.9
and probably earlier. This is simply a problem with the sqlalchemy implementation. The following test fails from at least v1.6.9 and upwards (and most likely it also fails for many versions older than that, but that I haven't confirmed):
def test_iterall_with_mutation(self):
"""Test that nodes can be mutated while being iterated using ``QueryBuilder.iterall``."""
for node in range(2):
orm.Data().store()
for [node] in orm.QueryBuilder().append(orm.Data).iterall():
node.base.extras.set('key', 'value')
The same test on v1.6.9 with Django works just fine though.
Based on the answer in this thread (by the author of sqlalchemy
) we shouldn't be calling commit on a cursor while we haven't yet fully fetched all rows. This is what is happening however. In our ModelWrapper
, when we mutate a field on a stored node (as is being done by setting an extra), it commits unless it is in a transaction. But we are not in an explicit transaction here.
So fundamentally our current implementation does not allow mutating nodes during an iterall
of the QueryBuilder
. The direct solution is that the user opens a transaction before starting the iteration, but this is not really something we can enforce on the user. I am not sure if we could always pre-emptively open a transaction when iterating over rows of a query.
In normal applications, the changes would be committed once at the end of the iteration. Alternatively, we might get rid of the commit in the ModelWrapper.save
and instead set autoflush=True
on the sessionmaker
of sqlalchemy
. With that enabled, the session will automatically flush data periodically when it needs to. This will solve this particular example, but it is not clear whether it causes problems elsewhere.