mysql icon indicating copy to clipboard operation
mysql copied to clipboard

[api design] Encoding values for LOAD DATA LOCAL INFILE

Open Jille opened this issue 1 year ago • 11 comments

Hi. I'm working on a feature for sqlc to do a bulk insert. sqlc generates a struct with the columns of the table, and I want to do a bulk insert of those.

Now my challenge is how to encode those values into a TSV as supported by https://dev.mysql.com/doc/refman/8.0/en/load-data.html#load-data-field-line-handling.

I wrote a small library for this (https://github.com/hexon/mysqltsv/blob/main/mysqltsv.go) but to correctly encode time.Time I need to know the Location from https://pkg.go.dev/github.com/go-sql-driver/mysql#Config.

What would be the cleanest way to support this?

  • Expose a Config() method on mysqlConn which people could access through https://pkg.go.dev/database/sql#Conn.Raw
  • Expose a Config(v any) function that takes the value from https://pkg.go.dev/database/sql#Conn.Raw (same as the previous, but as a function that takes an argument rather than a method's receiver)
  • RegisterStructsHandler that is responsible for converting structs to TSV and can be used instead of RegisterReaderHandler.
  • Pass the Config to the callback of RegisterReaderHandler (in a backwards compatible way)

I'd love to hear it if you any other suggestions.

Jille avatar Apr 22 '23 12:04 Jille

Most simple and quick way is just user pass the location to sqlc. User knows dsn or mysql.Config so that user knows the correct loc.

For future, my idea is:

// in mysql driver
type MySQLConn interface {
  private()  // Other module can not implement this interface. We can add more APIs in the future.

  Location() *time.Location  // Get Config.Location of this connection. (This may differ from `time_zone` connection variable.)
  LoadFromReader(r *io.Reader) error // Use this instead of RegisterReaderHandler.
}

// in user code.
var f io.Reader // some data.
err := conn.Raw(func (dc any) error {
  mc := dc.(MySQLConn)
  return mc.LoadFromReader(f)
})

methane avatar Apr 23 '23 15:04 methane

I feel that would be a burden on the user. It requires them to plumb through the connection settings to every place that needs it.

I like that interface. (Though I'm not entirely sure how LoadFromReader would work. Would that run a LOAD DATA INFILE query itself?) (And it wouldn't be perfect for sqlc, because it would require implementing the Raw method for any middleware.

How do you feel about #1419? That'd allow me to encode correctly.

Jille avatar Apr 25 '23 07:04 Jille

I find my old idea:

https://github.com/go-sql-driver/mysql/pull/1060#issuecomment-585574043

methane avatar Apr 25 '23 11:04 methane

My old idea doesn't require new interface.

methane avatar Apr 25 '23 12:04 methane

While I like the idea of avoiding a register+unregister and passing the data to the call, it does have a few downsides:

  • It breaks people having middleware in between this driver and database/sql. For example those doing query (error) logging.
  • It still needs to expose the *time.Location somehow.
  • It wouldn't work in sqlc, which requires an interface like https://github.com/kyleconroy/sqlc/blob/main/examples/authors/mysql/db.go -- which doesn't expose the Raw method (and adding it would be a problem for middleware).

Jille avatar Apr 25 '23 16:04 Jille

I feel that would be a burden on the user. It requires them to plumb through the connection settings to every place that needs it.

I feel it is far better than #1419. #1419 add ugly API only for getting Config.Loc.

  • Config.Clone() is bit heavy. Why need to clone all config only for a *Location?

  • Config contains password. Why need to add surface to leak password only for a *Location?

  • Many users (including me) just use only UTC. No need to pass location around.

  • Some users using Location just need to pass it to bulk insert. It is not big burden.

    • If it is burden, user can store it in Context.

methane avatar Apr 26 '23 06:04 methane

  • Config.Clone() is bit heavy. Why need to clone all config only for a *Location?

I don't mind changing that. With public libraries I tend to try to make them user-proof and make sure they can't break things by changing an internal value.

  • Config contains password. Why need to add surface to leak password only for a *Location?

Fair enough.

  • Many users (including me) just use only UTC. No need to pass location around.

Me too, sadly sqlc can't assume all users do that.

  • Some users using Location just need to pass it to bulk insert. It is not big burden.

    • If it is burden, user can store it in Context.

I'd totally do that in my own project, but it will break sqlc users that switch from simple INSERT INTOs to a LOAD DATA INFILE and suddenly correctly encoding time.Time's depends on a context variable.

I've updated #1419 with a new proposal, to only pass a struct with the *Location. (A struct so we can later extend it without breaking compatibility.) What do you think?

Alternatively, what do you think of adding an API that handles all the TSV encoding? That API would be really useful for other go-sql-driver/mysql users.

Jille avatar Apr 26 '23 08:04 Jille

I've updated #1419 with a new proposal, to only pass a struct with the *Location. (A struct so we can later extend it without breaking compatibility.) What do you think?

I still hate it. It increase public APIs only for your use case.

Alternatively, what do you think of adding an API that handles all the TSV encoding? That API would be really useful for other go-sql-driver/mysql users.

API compatible to pg or pgx is better than registry.

methane avatar Apr 26 '23 09:04 methane

Sorry, I'm not sure what you mean with your last sentence. (Particularly, what do you mean with "registry"?)

Jille avatar Apr 26 '23 09:04 Jille

I meant RegisterXxx() APIs. I don't like them and avoid adding more such APIs.

methane avatar Apr 26 '23 10:04 methane

Ok, I've thought of a way to make your proposed API work with sqlc, so I've sent a PR for that: https://github.com/go-sql-driver/mysql/pull/1454

Jille avatar Jun 26 '23 07:06 Jille