datajoint-python
datajoint-python copied to clipboard
`unite_master_parts` breaks topological order of descendants and thus breaks cascading of table drops
Bug Report
Description
unite_master_parts does not always preserve topological order when reordering the list of tables and causes issues when dropping tables.
Reproducibility
- Environment: datajoint codebook (python 3.9.5, datajoint 0.13.8)
- Minimum steps to reproduce the issue:
import datajoint as dj
schema = dj.Schema('zhuokun_test')
@schema
class A(dj.Manual):
definition = """
a: int
"""
@schema
class M1(dj.Manual):
definition = """
-> A
"""
@schema
class M(dj.Manual):
definition = """
-> A
"""
class Part(dj.Part):
definition = """
-> master
-> M1
"""
A.drop()
- Logs:
[2022-09-30 21:17:15,746][INFO]: Connecting [email protected]:3306
[2022-09-30 21:17:15,864][INFO]: Connected [email protected]:3306
`zhuokun_test`.`a` (0 tuples)
`zhuokun_test`.`m` (0 tuples)
`zhuokun_test`.`m__part` (0 tuples)
`zhuokun_test`.`m1` (0 tuples)
Proceed? [yes, No]: yes
---------------------------------------------------------------------------
OperationalError Traceback (most recent call last)
Input In [1], in <cell line: 28>()
23 class Part(dj.Part):
24 definition = """
25 -> master
26 -> M1
27 """
---> 28 A.drop()
File /opt/conda/lib/python3.9/site-packages/datajoint/table.py:646, in Table.drop(self)
644 if do_drop:
645 for table in reversed(tables):
--> 646 FreeTable(self.connection, table).drop_quick()
647 print("Tables dropped. Restart kernel.")
File /opt/conda/lib/python3.9/site-packages/datajoint/table.py:605, in Table.drop_quick(self)
603 if self.is_declared:
604 query = "DROP TABLE %s" % self.full_table_name
--> 605 self.connection.query(query)
606 logger.info("Dropped table %s" % self.full_table_name)
607 self._log(query[:255])
File /opt/conda/lib/python3.9/site-packages/datajoint/connection.py:340, in Connection.query(self, query, args, as_dict, suppress_warnings, reconnect)
338 cursor = self._conn.cursor(cursor=cursor_class)
339 try:
--> 340 self._execute_query(cursor, query, args, suppress_warnings)
341 except errors.LostConnectionError:
342 if not reconnect:
File /opt/conda/lib/python3.9/site-packages/datajoint/connection.py:296, in Connection._execute_query(cursor, query, args, suppress_warnings)
294 cursor.execute(query, args)
295 except client.err.Error as err:
--> 296 raise translate_query_error(err, query)
File /opt/conda/lib/python3.9/site-packages/datajoint/connection.py:294, in Connection._execute_query(cursor, query, args, suppress_warnings)
291 if suppress_warnings:
292 # suppress all warnings arising from underlying SQL library
293 warnings.simplefilter("ignore")
--> 294 cursor.execute(query, args)
295 except client.err.Error as err:
296 raise translate_query_error(err, query)
File /opt/conda/lib/python3.9/site-packages/pymysql/cursors.py:148, in Cursor.execute(self, query, args)
144 pass
146 query = self.mogrify(query, args)
--> 148 result = self._query(query)
149 self._executed = query
150 return result
File /opt/conda/lib/python3.9/site-packages/pymysql/cursors.py:310, in Cursor._query(self, q)
308 self._last_executed = q
309 self._clear_result()
--> 310 conn.query(q)
311 self._do_get_result()
312 return self.rowcount
File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:548, in Connection.query(self, sql, unbuffered)
546 sql = sql.encode(self.encoding, "surrogateescape")
547 self._execute_command(COMMAND.COM_QUERY, sql)
--> 548 self._affected_rows = self._read_query_result(unbuffered=unbuffered)
549 return self._affected_rows
File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:775, in Connection._read_query_result(self, unbuffered)
773 else:
774 result = MySQLResult(self)
--> 775 result.read()
776 self._result = result
777 if result.server_status is not None:
File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:1156, in MySQLResult.read(self)
1154 def read(self):
1155 try:
-> 1156 first_packet = self.connection._read_packet()
1158 if first_packet.is_ok_packet():
1159 self._read_ok_packet(first_packet)
File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:725, in Connection._read_packet(self, packet_type)
723 if self._result is not None and self._result.unbuffered_active is True:
724 self._result.unbuffered_active = False
--> 725 packet.raise_for_error()
726 return packet
File /opt/conda/lib/python3.9/site-packages/pymysql/protocol.py:221, in MysqlPacket.raise_for_error(self)
219 if DEBUG:
220 print("errno =", errno)
--> 221 err.raise_mysql_exception(self._data)
File /opt/conda/lib/python3.9/site-packages/pymysql/err.py:143, in raise_mysql_exception(data)
141 if errorclass is None:
142 errorclass = InternalError if errno < 1000 else OperationalError
--> 143 raise errorclass(errno, errval)
OperationalError: (3730, "Cannot drop table 'm1' referenced by a foreign key constraint 'm__part_ibfk_2' on table 'm__part'.")
Expected Behavior
Notice that datajoint was trying to drop M1 before dropping M.Part. The expected behavior should be dropping M.Part, M, and then M1.
Additional Research and Context
I dig into the code a little bit and realized datajoint would first collect all descendants of the table to drop, sort them by topological order, reorder the sorted list to unite master and part tables (unite_master_parts, which should not disrupt the topological sort), and then drop the tables in topological order.
However, unite_master_parts does not actually preserve topological order.
In the above example, the topologically sorted list of tables happens to be:
['`zhuokun_test`.`a`',
'`zhuokun_test`.`m`',
'`zhuokun_test`.`m1`',
'`zhuokun_test`.`m_part`']
and unite_master_parts would reorder the list to be:
['`zhuokun_test`.`a`',
'`zhuokun_test`.`m`',
'`zhuokun_test`.`m_part`'
'`zhuokun_test`.`m1`',]
which is not topologically sorted and would cause issues when dropping tables. This issue is probably also related to the behavior mentioned in https://github.com/datajoint/datajoint-python/issues/927#issuecomment-854385872
Great catch!
Thank you for reproducing the problem so well. Yes, the current unite_master_part is based on a faulty assumption that failed in this case. The algorithm meant to bring parts closer to their masters for transaction processing without breaking topological sorting. Fixing...