No fast-track for bulk inserts in Cursor.executemany with INSERT/REPLACE syntax introduced in MySQL 8.0.19
asyncmy is not handling the new syntax for doing bulk inserts correctly. The same issue also exists in aiomysql with a good explanation of what is happening. That issue is: https://github.com/aio-libs/aiomysql/issues/968
MySQL is deprecating this syntax:
INSERT INTO t1 (a,b,c) VALUES (1,2,3),(4,5,6)
ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b);
and asyncmy emits a warning about it. However if you try to use the new syntax:
INSERT INTO t1 (a,b,c) VALUES (1,2,3),(4,5,6) AS new(m,n,p)
ON DUPLICATE KEY UPDATE c = m+n;
the insertion speed becomes extremely slow.
I'm attaching the same example program as I did in the aiomysql bug:
I am seeing this issue in asyncmy = "0.2.10".
From my test app:
DEBUG:root:On deprecated named col duplicate insert of 8000 rows took 0.275493860244751 seconds
DEBUG:root:On named col duplicate insert of 8000 rows took 24.421008110046387 seconds
DEBUG:root:On placeholder duplicate insert of 8000 rows took 1.0064198970794678 seconds
I'll attach a sample program demonstrating the issue. For ease, add a .env file with the following information for your db and then run the dbtest2.py file:
#.env
DATABASE_USER=xxx
DATABASE_PASSWORD=yyy
DATABASE_HOST=aaa
DATABASE_NAME=zzzz
Additional note, in my example, changing the syntax from something like:
INSERT INTO my_table (c1, c2)
VALUES (VALUES (%(c1)s,VALUES (%(c2)s)
AS new
ON DUPLICATE KEY UPDATE
c2=new.c2
and using named values as the input to something like:
INSERT INTO my_table (c1, c2)
VALUES (%s, %s )
AS new
ON DUPLICATE KEY UPDATE
c2=new.c2
and using rows as input, the query performed almost as fast as the deprecated syntax. However, when I tried to create a generic table to demonstrate the phenomenon, I couldn't. So something is also special with the example table structure. That could be related to the underlying issue, or could be another issue.
Another note, the mysql-connector-python lib handles the syntax with no problems.
Made a fix and added a test on the regex, but perhaps we could enhance the test per the comment in the PR by mocking execute to ensure we don't fall back to it.