temporal
temporal copied to clipboard
sqlplugins/mysql: default interpolateParams in DSN
By default, the go-sql-driver/mysql driver wraps every query in a prepared statement, and this extra prepared statement means every single query is effectively two round trips. A request of ComPrepare to prepare the query, then ComStmtExecute with the statement id and the parameters.
In theory, this would be be fine if Temporal was explicitly aware of this and prepared their own statements explicitly and reused the prepared statements over and over through the lifetime.
Short of that, wrapping every query in a prepared statement just effectively doubles the queries and double the latency.
I had originally considered exposing this as a configuration knob, since this can be set through connectAttributes manually, but I don't think there's value in having this be configurable and we can provide a better default.
See: https://github.com/go-sql-driver/mysql#interpolateparams
How did you test it?
I have tested locally, and we are starting to tell our customers (PlanetScale) to set this manually in their connectAttributes.
Potential risks
There's a potential for some reason, the behavior to change and not work with client side interpolation, but it's unknown currently and I think would be considered a bug in the go-sql-driver/mysql library if anything is uncovered.
Is hotfix candidate?
I'd say Yes, but I personally have little experience with the Temporal community. :)
I guess there are a few failing test cases, so this might be tricky for me to debug. :/ But if we can make this work, or at minimum exposed more easily as a config option, that'd be great.
I guess there are a few failing test cases, so this might be tricky for me to debug. :/ But if we can make this work, or at minimum exposed more easily as a config option, that'd be great.
I'll take a look at it and see if I can help you resolve those. I saw in our history that we've had other issues with ? and mysql8 and this seems related
I saw in our history that we've had other issues with ? and mysql8 and this seems related
I'd be interested in hearing about it.
For context, I'm sorta a middleman here trying to get a better experience for mutual users. :)
I'm quite familiar with mysql protocol and the mysql driver side of things, and I know how this behavior works in PlanetScale and vitess. We have some mutual customers that would benefit quite drastically from either, using prepared statements explicitly so they are saved and reued for the lifetime, or avoid them and stick to client side interpolation. It's just a lot of wasted CPU cycles on the server to handle the extra Prepare + Execute for every query.
My assumption is that Temporal didn't explicitly choose this behavior, since it's the default of the mysql driver, and I think from the mysql driver, they are erring on the side of caution. We've recommended for most applications to set ?interpolateParams=true and I've been doing this for many years for performance things without any issues.
I'm not actually sure what query patterns Temporal is testing and if it's a side effect of an explicit test, or if it's really a bug we'll need to work around and figure out how to do a query differently.
I'm willing to offer up my help how I can, but not being a Temporal user puts me in a slightly challenging position here. This would definitely be mutually beneficial to avoid the overhead of 2 round trip per query if we can make it happen.
I saw in our history that we've had other issues with ? and mysql8 and this seems related
I'd be interested in hearing about it.
For context, I'm sorta a middleman here trying to get a better experience for mutual users. :)
I'm quite familiar with mysql protocol and the mysql driver side of things, and I know how this behavior works in PlanetScale and vitess. We have some mutual customers that would benefit quite drastically from either, using prepared statements explicitly so they are saved and reued for the lifetime, or avoid them and stick to client side interpolation. It's just a lot of wasted CPU cycles on the server to handle the extra Prepare + Execute for every query.
My assumption is that Temporal didn't explicitly choose this behavior, since it's the default of the mysql driver, and I think from the mysql driver, they are erring on the side of caution. We've recommended for most applications to set
?interpolateParams=trueand I've been doing this for many years for performance things without any issues.I'm not actually sure what query patterns Temporal is testing and if it's a side effect of an explicit test, or if it's really a bug we'll need to work around and figure out how to do a query differently.
I'm willing to offer up my help how I can, but not being a Temporal user puts me in a slightly challenging position here. This would definitely be mutually beneficial to avoid the overhead of 2 round trip per query if we can make it happen.
I saw we have a comment at https://github.com/temporalio/temporal/pull/3886/files#diff-068f89497865ae54e0047289cd7c911f3afc746c9a920bd57b4aae4c1bdabac6R32 about this behavior but I'm not sure why that's the case. I'll let you know if I figure anything out.
I'm assuming you haven't had anyone running mysql8 try this change yet, as it'd have broken otherwise
It looks like only our visibility code is failing which is interesting, hrm.
I'm assuming you haven't had anyone running mysql8 try this change yet, as it'd have broken otherwise
We run all mysql8, but we don't have a lot of experience with this yet. I'm just did some basic testing and we're working to solicit feedback from customers. The challenge is getting them to adopt this config when it's very hard to set with the default docker-compose setups and whatnot, it's not exposed as an option.
As for the ? stuff, so that in theory shouldn't even be sent over the wire. That's the point of the interpolateParams=true option, it converts the ? and escapes the values in place. So maybe in this case, you need to go back to ? instead of using %v to let the driver do the escaping properly. But ? shouldn't be going over the wire.
The client side interpolation is looking for exactly that: https://github.com/go-sql-driver/mysql/blob/v1.7.1/connection.go#L198-L295
I'm assuming you haven't had anyone running mysql8 try this change yet, as it'd have broken otherwise
We run all mysql8, but we don't have a lot of experience with this yet. I'm just did some basic testing and we're working to solicit feedback from customers. The challenge is getting them to adopt this config when it's very hard to set with the default docker-compose setups and whatnot, it's not exposed as an option.
As for the
?stuff, so that in theory shouldn't even be sent over the wire. That's the point of theinterpolateParams=trueoption, it converts the?and escapes the values in place. So maybe in this case, you need to go back to?instead of using%vto let the driver do the escaping properly. But?shouldn't be going over the wire.The client side interpolation is looking for exactly that: https://github.com/go-sql-driver/mysql/blob/v1.7.1/connection.go#L198-L295
Yep. I'll need to dig into our mysql8 visibility code to figure out which query is failing and trace it back
From my understanding, I know there are 2 databases in play here. I think from my understanding, the visibility database is much less volume than the main temporal database?
If that's correct, it might be a safer first step to apply this setting as a default only for temporal and not visibility as an intermediate step since that'd be arguably the biggest win.
Aha, it's https://github.com/go-sql-driver/mysql/issues/819. Our visibility schema embeds the memo as a []byte which can't be used with interpolateParams=true for a JSON field.
These queries are hard-coded and can be trivially prepared; I'll hop on that this afternoon; it shouldn't be a big change and will let us enable that option for you
Aha, yeah, doing an explicit prepare then for that sounds like a winner. :)
The simplest approach is actually to track the Memo as json.RawMessage since the mysql driver handles that directly. That's simpler than manually preparing all our statements while should still let folks enable this.
As for setting this as our default: I'm curious as to whether there are downsides to using interpolateParams=true. The biggest I can think of is if the driver's interpolation code isn't entirely correct and could let SQL injection sneak in.
As for setting this as our default: I'm curious as to whether there are downsides to using interpolateParams=true. The biggest I can think of is if the driver's interpolation code isn't entirely correct and could let SQL injection sneak in.
I think there's a risk with any client side interpolation, but every ORM and whatnot does it. Given that this mysql driver is the most popular in Go that I'm aware of, I would suspect most cases have been figured out by now, given nobody at large scale would want to run with their default behavior. We encourage setting this on for our customers, etc.
They have one use case:
This can not be used together with the multibyte encodings BIG5, CP932, GB2312, GBK or SJIS. These are rejected as they may introduce a SQL injection vulnerability!
And it sounds like they guard against it by just disallowing it for those encodings. I'd be curious if you had any other cases that you're aware of that would cause an issue that the driver doesn't outright reject.
One benefit of MySQL in this case is their types are quite a bit simpler and escaping is a bit easier than in Postgres.
We're on the same page there. I'm asking around our team just in case someone can think of something but I'm testing a patch that'll allow this to work at the same time
I put up a new PR at https://github.com/temporalio/temporal/pull/5428 that contains your change plus the one other change required to support it (and some cleanup while I was here). I'll keep iterating on it as tests finish, and will discuss with the team whether this is hotfix-worthy
tyty it makes sense that encoding JSON as binary would be incorrect in this context since JSON is explicitly UTF-8.
I really appreciate carrying this through. Good open source response. 🫶
tyty it makes sense that encoding JSON as binary would be incorrect in this context since JSON is explicitly UTF-8.
Yeah. I'm seeing reports in other MySQL drivers and forums about this happening with the utf8mb4 character set specifically (which our mysql8 plugin uses). It looks like there's still a few more issues I have to fix but I should have it all done early next week.
I really appreciate carrying this through. Good open source response. 🫶
Of course! I'm happy to deliver an easy speedup for customers using our MySQL backend 😄
Enabling interpolateParams is causing more problems than I expected; I'll keep poking away at this next week but we have a long weekend this weekend.
If you have customers running mysql8 with this set make sure they don't enable it for mysql8-based visibility stores while I figure out what's wrong
Do you have benchmarks that measure the performance improvement with this set to true, by the way?
Do you have benchmarks that measure the performance improvement with this set to true, by the way?
I don't, other than it's objectively doing less than half the work and half the requests against Vitess. I can't easily quantify what the tangible savings is, but that's a lot of round trip and extra CPU on both sides being spent.
I'm going to close this in favor of https://github.com/temporalio/temporal/pull/5441 and https://github.com/temporalio/temporal/pull/5428 as this PR will cause errors for folks using MySQL 8 visibility stores.
We can continue discussion here however
As I mentioned earlier, I think keeping this off entirely from visibility store is more than fine. From my limited understanding of Temporal, it seems a vast majority of the workload is in the main temporal database, not the visibility one. I might be misinterpreting that though.
I'm a customer of @mattrobenolt's and I wanted to add some voice here. Prepared statements are a scaling concern for my self-hosted Temporal v1.20.4 instance. Here are some quick measures:
- The cluster has a throughput of 300-350 workflows per second, roughly 1,500-1,800 activities per second
- The history service maintains a rate of 22-27k persistence requests per second, which translates to 150-190k queries per second as measured by the DB
I have been needing to scale my number of history services significantly recently, I believe in part due to the query multiplier that prepared statements adds. We have been seeing a lot of logs that indicate DB timeouts, but the DB is healthy, so we think the connection pool may be too small and that there's contention within the history service. The history service isn't under significant CPU or memory pressure. We have 32 instances each with 60 connections right now, which feels significant.
@tdeebswihart it looks like #5428 should be close to landing?
I'm a customer of @mattrobenolt's and I wanted to add some voice here. Prepared statements are a scaling concern for my self-hosted Temporal v1.20.4 instance. Here are some quick measures:
* The cluster has a throughput of 300-350 workflows per second, roughly 1,500-1,800 activities per second * The history service maintains a rate of 22-27k persistence requests per second, which translates to 150-190k queries per second as measured by the DBI have been needing to scale my number of history services significantly recently, I believe in part due to the query multiplier that prepared statements adds. We have been seeing a lot of logs that indicate DB timeouts, but the DB is healthy, so we think the connection pool may be too small and that there's contention within the history service. The history service isn't under significant CPU or memory pressure. We have 32 instances each with 60 connections right now, which feels significant.
@tdeebswihart it looks like #5428 should be close to landing?
It's as close as "I need to hit the merge button" and I just did, thank you for the reminder! I'd become caught up in other work, pardon the lateness.
@tdeebswihart that's the best kind of close! I appreciate the work that you and @mattrobenolt have done here, I'm hoping this is a massive win for self-hosted MySQL.
As far as releasing this goes: we're happy to backport this to the 1.23 release (and make 1.23.1) but don't have the bandwidth to test this functionality ourselves right now. Would y'all (@mattrobenolt) be comfortable with testing it?
I think @emmercm might be in a better position to test this with us. My knowledge of Temporal is only surface level. Reach out and let's see how it goes. :)
No need to necessarily backport, I'm on an older Temporal version because I haven't invested the time to do the v1.21 and v1.22 schema upgrades yet - not for any specific reason. I need to spend the time upgrading before I'll be able to effectively test this.
Alright, then this will be included as part of our next release