aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

extra can not be set on the fly when query the database

Open unkcpz opened this issue 1 year ago • 1 comments

The behavior is changed between 2.0.3 and 2.0.4 on query. Setting the extra attributes of the node by query.iterall(). As reported in https://github.com/aiidateam/aiida-tutorials/issues/435. The query for the structure data and set the extras

    from aiida.orm import QueryBuilder
    query = QueryBuilder()
    query.append(StructureData, filters={'extras':{'!has_key':'formula'}})
    for structure, in query.iterall():
        structure.base.extras.set('formula', structure.get_formula(mode='count'))

Will cause the error:

09/27/2022 10:08:53 AM <2081> sqlalchemy.pool.impl.QueuePool: [ERROR] Error closing cursor
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py", line 1099, in fetchmany
    new = dbapi_cursor.fetchmany(size - lb)
psycopg2.ProgrammingError: named cursor isn't valid anymore

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1995, in _safe_close_cursor
    cursor.close()
psycopg2.ProgrammingError: named cursor isn't valid anymore
---------------------------------------------------------------------------
ProgrammingError                          Traceback (most recent call last)
/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py in fetchmany(self, result, dbapi_cursor, size)
   1098             try:
-> 1099                 new = dbapi_cursor.fetchmany(size - lb)
   1100             except BaseException as e:

ProgrammingError: named cursor isn't valid anymore

The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)
<ipython-input-1-916cde5a3649> in <module>
    134         structure.set_extra('formula', structure.get_formula(mode='count'))
    135 
--> 136 store_formula_in_extra()

<ipython-input-1-916cde5a3649> in store_formula_in_extra()
    131     query = QueryBuilder()
    132     query.append(StructureData, filters={'extras':{'!has_key':'formula'}})
--> 133     for structure, in query.iterall():
    134         structure.set_extra('formula', structure.get_formula(mode='count'))
    135 

/opt/conda/lib/python3.9/site-packages/aiida/orm/querybuilder.py in iterall(self, batch_size)
   1048         :returns: a generator of lists
   1049         """
-> 1050         for item in self._impl.iterall(self.as_dict(), batch_size):
   1051             # Convert to AiiDA frontend entities (if they are such)
   1052             for i, item_entry in enumerate(item):

/opt/conda/lib/python3.9/site-packages/aiida/storage/psql_dos/orm/querybuilder/main.py in iterall(self, data, batch_size)
    167             stmt = build.query.statement.execution_options(yield_per=batch_size)
    168 
--> 169             for resultrow in self.get_session().execute(stmt):
    170                 yield [self.to_backend(rowitem) for rowitem in resultrow]
    171 

/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/result.py in iterrows(self)
    380 
    381             def iterrows(self):
--> 382                 for row in self._fetchiter_impl():
    383                     row = make_row(row) if make_row else row
    384                     if post_creational_filter:

/opt/conda/lib/python3.9/site-packages/sqlalchemy/orm/loading.py in chunks(size)
    140 
    141             if yield_per:
--> 142                 fetch = cursor.fetchmany(yield_per)
    143 
    144                 if not fetch:

/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/result.py in fetchmany(self, size)
   1095         """
   1096 
-> 1097         return self._manyrow_getter(self, size)
   1098 
   1099     def all(self):

/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/result.py in manyrows(self, num)
    540                     num = real_result._yield_per
    541 
--> 542                 rows = self._fetchmany_impl(num)
    543                 if make_row:
    544                     rows = [make_row(row) for row in rows]

/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py in _fetchmany_impl(self, size)
   1806 
   1807     def _fetchmany_impl(self, size=None):
-> 1808         return self.cursor_strategy.fetchmany(self, self.cursor, size)
   1809 
   1810     def _raw_row_iterator(self):

/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py in fetchmany(self, result, dbapi_cursor, size)
   1099                 new = dbapi_cursor.fetchmany(size - lb)
   1100             except BaseException as e:
-> 1101                 self.handle_exception(result, dbapi_cursor, e)
   1102             else:
   1103                 if not new:

/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py in handle_exception(self, result, dbapi_cursor, err)
    939 
    940     def handle_exception(self, result, dbapi_cursor, err):
--> 941         result.connection._handle_dbapi_exception(
    942             err, None, None, dbapi_cursor, result.context
    943         )

/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   2122                 util.raise_(newraise, with_traceback=exc_info[2], from_=e)
   2123             elif should_wrap:
-> 2124                 util.raise_(
   2125                     sqlalchemy_exception, with_traceback=exc_info[2], from_=e
   2126                 )

/opt/conda/lib/python3.9/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***)
    206 
    207         try:
--> 208             raise exception
    209         finally:
    210             # credit to

/opt/conda/lib/python3.9/site-packages/sqlalchemy/engine/cursor.py in fetchmany(self, result, dbapi_cursor, size)
   1097         if size > lb:
   1098             try:
-> 1099                 new = dbapi_cursor.fetchmany(size - lb)
   1100             except BaseException as e:
   1101                 self.handle_exception(result, dbapi_cursor, e)

ProgrammingError: (psycopg2.ProgrammingError) named cursor isn't valid anymore
(Background on this error at: https://sqlalche.me/e/14/f405)

unkcpz avatar Sep 27 '22 12:09 unkcpz

As discussed, this appears to be a result of #5654, whereby iterall has an open connection, which is pulling data from the database, but then something like set_extra is writing to the database during this, and apparently closing the transaction, so then when iterall goes to pull more it can't. If it worked before though, then obviously something has changed.

The logic in iterall itself, I don't feel has changed:

image

and also use_query to query_session seems to have no changes 🤔

    @contextmanager
    def use_query(self, data: QueryDictType) -> Iterator[Query]:
        """Yield the built query."""
        query = self._update_query(data)
        try:
            yield query
        except Exception:
            self.get_session().close()
            raise
    @contextmanager
    def query_session(self, data: QueryDictType) -> Iterator[BuiltQuery]:
        """Yield the built query, ensuring the session is closed on an exception."""
        query = self.get_query(data)
        try:
            yield query
        except Exception:
            self.get_session().close()
            raise

chrisjsewell avatar Sep 28 '22 05:09 chrisjsewell

The confusion is understandable. This bug was not introduced by #5654 , it has existed for all of v2.0. In fact, it even existed for 1.6.9 and probably earlier. This is simply a problem with the sqlalchemy implementation. The following test fails from at least v1.6.9 and upwards (and most likely it also fails for many versions older than that, but that I haven't confirmed):

def test_iterall_with_mutation(self):
    """Test that nodes can be mutated while being iterated using ``QueryBuilder.iterall``."""
    for node in range(2):
        orm.Data().store()

    for [node] in orm.QueryBuilder().append(orm.Data).iterall():
        node.base.extras.set('key', 'value')

The same test on v1.6.9 with Django works just fine though.

Based on the answer in this thread (by the author of sqlalchemy) we shouldn't be calling commit on a cursor while we haven't yet fully fetched all rows. This is what is happening however. In our ModelWrapper, when we mutate a field on a stored node (as is being done by setting an extra), it commits unless it is in a transaction. But we are not in an explicit transaction here.

So fundamentally our current implementation does not allow mutating nodes during an iterall of the QueryBuilder. The direct solution is that the user opens a transaction before starting the iteration, but this is not really something we can enforce on the user. I am not sure if we could always pre-emptively open a transaction when iterating over rows of a query.

In normal applications, the changes would be committed once at the end of the iteration. Alternatively, we might get rid of the commit in the ModelWrapper.save and instead set autoflush=True on the sessionmaker of sqlalchemy. With that enabled, the session will automatically flush data periodically when it needs to. This will solve this particular example, but it is not clear whether it causes problems elsewhere.

sphuber avatar Oct 28 '22 20:10 sphuber