vitess icon indicating copy to clipboard operation
vitess copied to clipboard

Feature Request: UPDATE with JOINs with all the same vindexes does not need OperatorType DMLWithInput

Open wiebeytec opened this issue 1 year ago • 2 comments

Overview of the Issue

Vitess recently got UPDATE with JOIN support, but there may be some inefficiency/deficiency. It seems to resort to OperatorType DMLWithInput (which rewrites to SELECT ... FOR UPDATE; UPDATE ... ) when it doesn't have to. This is not only inefficient, but also, when does that rewrite, it causes deadlocks if you have multiple connections doing it an the same time (in transactions). I ran into this.

Expected result: when using fields from joined tables from the same shard/vindex in the SET expression, it can still use OperatorType Update.

Reproduction Steps

As an example that is treated well: I use hard-coded values in SET, it uses a normal update:

vexplain UPDATE
alarmLog AS al
INNER JOIN lastLogData as lld
ON al.idSite = lld.idSite
SET al.valueEnum = 2
WHERE lld.valueEnum != 1
AND al.idSite = 98478 \G
{                                                                                       
        "OperatorType": "Update",                                                             
        "Variant": "EqualUnique",                                                             
        "Keyspace": {                                                                         
                "Name": "sites2024",                                                                                  
                "Sharded": true                                                                                                                                                             
        },                                                                                                                                                                                  
        "TargetTabletType": "PRIMARY",                                                                                                                                                                                                      
        "Query": "update alarmLog as al, lastLogData as lld set al.valueEnum = 2 where al.idSite = 98478 and lld.valueEnum != 1 and al.idSite = lld.idSite",                                                                                
        "Table": "alarmLog",                                                                                          
        "Values": [                    
                "98478"                
        ],                                                                                    
        "Vindex": "a_standard_hash"                                                           
}

But when I take the value to set from the JOIN, it uses DMLWithInput:

vexplain UPDATE
alarmLog AS al
INNER JOIN lastLogData as lld
ON al.idSite = lld.idSite
SET al.valueEnum = lld.valueEnum
WHERE lld.valueEnum != 1
AND al.idSite = 98478 
\G
{
        "OperatorType": "DMLWithInput",
        "TargetTabletType": "PRIMARY",
        "BindVars": [
                "0:[lld_valueEnum:1]"
        ],
        "Offset": [
                "0:[0]"
        ],
        "Inputs": [
                {
                        "OperatorType": "Route",
                        "Variant": "EqualUnique",
                        "Keyspace": {
                                "Name": "sites2024",
                                "Sharded": true
                        },
                        "FieldQuery": "select al.idAlarm, lld.valueEnum from alarmLog as al, lastLogData as lld where 1 != 1",
                        "Query": "select al.idAlarm, lld.valueEnum from alarmLog as al, lastLogData as lld where al.idSite = 98478 and lld.valueEnum != 1 and al.idSite = lld.idSite for update",
                        "Table": "alarmLog, lastLogData",
                        "Values": [
                                "98478"
                        ],
                        "Vindex": "a_standard_hash"
                },
                {
                        "OperatorType": "Update",
                        "Variant": "Scatter",
                        "Keyspace": {
                                "Name": "sites2024",
                                "Sharded": true
                        },
                        "TargetTabletType": "PRIMARY",
                        "Query": "update alarmLog as al set al.valueEnum = :lld_valueEnum where al.idAlarm in ::dml_vals",
                        "Table": "alarmLog"
                }
        ]
}

The relevant table entries in the VSchema:

"alarmLog": {         
      "column_vindexes": [
        {                    
          "column": "idSite",      
          "name": "a_standard_hash"
        }
      ],
      "auto_increment" : {
        "column": "idAlarm",
        "sequence": "alarmLog_seq"                                                                                    
      }
    },

And:

"lastLogData": {              
      "column_vindexes": [
        {
          "column": "idSite",
          "name": "a_standard_hash"
        }
      ]                      
    },

Binary Version

vtgate version Version: 20.0.2 (Git revision 2592c5932b3036647868299b6df76f8ef28dfbc8 branch 'HEAD') built on Wed Sep 11 08:15:20 UTC 2024 by runner@fv-az1152-369 using go1.22.7 linux/amd64

Operating System and Environment details

cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=24.04
DISTRIB_CODENAME=noble
DISTRIB_DESCRIPTION="Ubuntu 24.04.1 LTS"

Log Fragments

No response

wiebeytec avatar Oct 02 '24 13:10 wiebeytec

This looks like a reasonable improvement. It does not seem like a bug but an enhancement.

harshit-gangal avatar Oct 09 '24 07:10 harshit-gangal

Is this a V20 issue only, or does this impact V19, too?

GrahamCampbell avatar Oct 16 '24 11:10 GrahamCampbell

This happens only from v20 onwards. Earlier release does not have the support for it.

harshit-gangal avatar Dec 03 '24 16:12 harshit-gangal

Did vitess 19 previously push down this query if it could be entirely pushed down on a sharded keyspace, or did it reject it entirely. Does V20 still do this, or does it always do the plan of select for update + update now?

GrahamCampbell avatar Dec 03 '24 16:12 GrahamCampbell

Did vitess 19 previously push down this query if it could be entirely pushed down on a sharded keyspace, or did it reject it entirely

If you need a fully authoritative answer I would have to put some effort into it (our dev env my not boostrap on 19), but I can at least say that before support for update with join, I had rewritten all my updates with joins to manual selects + updates, so as far as I know, it was not supported at all.

Does V20 still do this, or does it always do the plan of select for update + update now?

It doesn't always do 'select for update' + 'update'. You can see in the original report that when I use a hard-coded value, the type is Update (instead of DMLWithInput).

wiebeytec avatar Dec 24 '24 11:12 wiebeytec