dolt icon indicating copy to clipboard operation
dolt copied to clipboard

Syntax Error Occurs When Using AS Clause with ON DUPLICATE KEY UPDATE

Open koh789 opened this issue 1 year ago • 5 comments

Using version: v0.18.0

When executing the following query, an error ( syntax error at position 134 near 'AS') occurs. )

INSERT INTO test_campaigns (email, end_time, name, registered_time) VALUES (?,?,?,?) AS new ON DUPLICATE KEY UPDATE email = new.email, end_time = new.end_time

What could be the cause of this?

Starting with MySQL 8.0.19, it is possible to use an alias for the row, with, optionally, one or more of its columns to be inserted, following the VALUES or SET clause, and preceded by the AS keyword. Using the row alias new, the statement shown previously using VALUES() to access the new column values can be written in the form shown here:

https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html

If possible, I would be grateful if you could support version 8.0.20 and later. Thank you for your consideration.

koh789 avatar Mar 17 '24 10:03 koh789

This SQL syntax extension will get better visibility in the Dolt repo, so I'm going to move it over there.

timsehn avatar Mar 22 '24 18:03 timsehn

@koh789 , thank you again for letting us know. We'd love to learn about your Dolt use case too. Feel free to join our Discord or send me an email if you'd like to share.

bpf120 avatar Apr 01 '24 22:04 bpf120

Hi Noguchi!

Unfortunately, it looks like we don't currently support the "row alias" syntax for insert statements.

As a workaround, we do support the previous syntax (using the VALUES function), so your example in the previous syntax would look like this:

INSERT INTO test_campaigns (email, end_time, name, registered_time) VALUES (?,?,?,?) ON DUPLICATE KEY UPDATE email = VALUE(new), end_time = VALUE(end_time)

I agree that supporting newer MySQL syntax is valuable. This doesn't seem like a difficult add either: we already support referencing subquery aliases in update statements, which can be demonstrated with the following equivalent statement:

insert into test_campaigns (email, end_time, name, registered_time) (select * from (select ? as email, ? as end_time, ? as name, ? as registered_time) as new) ON DUPLICATE KEY UPDATE email = new.email, end_time = new.end_time;

So adding support for this should be straightforward. I can do it today.

nicktobey avatar Apr 08 '24 19:04 nicktobey

I've been digging into this a bit more.

The VALUES() function is deprecated and due to be removed in a future version of MySQL. If that happens, we'll likely see a spike in users relying on row aliases like this. We definitely want to make sure that we support this before then.

Unfortunately, there are some quirks that make this less straightforward:

  • As demonstrated by the example, MySQL will accept expressions like email = new.email in the ON UPDATE clause. This is not ambiguous, because the engine will prefer to resolve the unqualified email to the table instead of the row alias. Our plan builder allows for this kind of preferential resolution in specific cases, but not generally.
  • The values alias may list column aliases (VALUES (?,?,?,?) AS new(email, end_time, name, registered_time)), but it's not required to. If it doesn't, then we still need to assign names to these value columns based on the columns listed in the insert statement. (But those are also optional, in which case we'd have to fall back on the destination table columns.)
  • We'd want to test for situations where the correct behavior is non-obvious (What happens if the insert columns have different names from the row alias columns? What if they have the same name but different order?, etc) All of these are doable, but it makes the process more complicated than just updating our parser.

nicktobey avatar Apr 12 '24 22:04 nicktobey

Update: I actually misunderstood how the column aliases work in this case. From the docs:

If, in addition, you use the column aliases m, n, and p, you can omit the row alias in the assignment clause and write the same statement like this:

INSERT INTO t1 (a,b,c) VALUES (1,2,3),(4,5,6) AS new(m,n,p) ON DUPLICATE KEY UPDATE c = m+n;

nicktobey avatar Apr 12 '24 23:04 nicktobey

This is an issue for us as well. We recently updated our query builders to use the new alias syntax instead of the old VALUES() syntax to avoid the deprecation warnings emitted by MySQL, but it turns out that change broke Dolt compatibility instead.

The fact that it's now supported by the parser also makes the resulting error message a bit confusing, since the following:

INSERT INTO tbl (id, name) VALUES (1, 'test1'), (2, 'test2'), (3, 'test3') AS tbl_new
ON DUPLICATE KEY UPDATE tbl.name = tbl_new.name;

results in a table not found: tbl_new error message as opposed to an error that more clearly indicates that aliases aren't implemented here to begin with.

arvidfm avatar Jun 05 '24 19:06 arvidfm

I apologize for the confusing error message: when we added support to the parser, we added a new error message that was supposed to be more clear:

error: insert row aliases are not currently supported; use the VALUES() function instead

If you're not seeing that error message, that's a bug.

Last time, we held off on implementing this because the name resolution rules were deceptively tricky, and we could point people to the old syntax as a workaround. But given that the old syntax is deprecated and people are migrating, we need to be able to handle the new synatx. I'm going to take another stab at it today.

nicktobey avatar Jun 05 '24 20:06 nicktobey

Update: I have a draft PR that gets the simplest case working (row alias, no column alias): https://github.com/dolthub/go-mysql-server/pull/2534

So with this PR, both @arvidfm and @koh789's sample queries should work.

I haven't fully tested it, and it does have some caveats:

  • It doesn't currently recognize column aliases (ie. VALUES (1, 'test1'), (2, 'test2'), (3, 'test3') AS tbl_new(new_id, new_name))
  • If there's a column name ambiguity between the row alias and another table, the engine is supposed to favor the insert source. So for instance, in ON DUPLICATE KEY UPDATE tbl.name = name, the unqualified name is supposed to resolve to tbl.name, not tbl_new.name. Currently with my draft PR, this returns an ambiguous column error.

I'm currently working on both of these issues.

nicktobey avatar Jun 05 '24 23:06 nicktobey

@nicktobey If it helps, that name resolution behaviour doesn't actually seem to be the case in MySQL, or at least not with Percona Server 8.3. I get an error message complaining that the column name is ambiguous if I do either name = tbl_new.name or tbl.name = name. Though perhaps different MySQL versions behave differently here.

arvidfm avatar Jun 05 '24 23:06 arvidfm

@arvidfm You're absolutely right! I was mistaken.

I got column aliases working last night too. I'm adding tests and will make the final PR shortly.

nicktobey avatar Jun 06 '24 15:06 nicktobey

PR is ready. It took some finesse to make it work properly with out-of-order insert statements, but everything should be working now.

nicktobey avatar Jun 06 '24 23:06 nicktobey

@nicktobey I noticed the related PR is merged while this issue is still open, is there any further work required for this or is it ready to be used?

arvidfm avatar Jun 12 '24 09:06 arvidfm

This is in 1.39.5:

https://github.com/dolthub/dolt/releases/tag/v1.39.5

timsehn avatar Jun 12 '24 15:06 timsehn

Perfect, thank you for the confirmation!

arvidfm avatar Jun 12 '24 15:06 arvidfm

@nicktobey

Sorry for the late reply. Wow! After updating to the following versions, I was able to successfully execute the alias query! Thank you for your prompt response! I will start using it right away.

koh789 avatar Jul 26 '24 10:07 koh789