postgresql-simple icon indicating copy to clipboard operation
postgresql-simple copied to clipboard

Adjust `executeMany` and `execute` formatting behavior

Open Chratho opened this issue 12 years ago • 5 comments

ExecuteMany formatting has actually more limitations than documented, and diverges from what is possible within simple execute calls:

psql:
> create table test (x integer, y varchar(3));

ghci:
> execute c "insert into test (x,y) values (?,'abc')" (Only 1)
1
> executeMany c "insert into test (x,y) values (?,'abc')" [Only 1, Only 2]
*** Exception: FormatError {fmtMessage = "syntax error in multi-row template", fmtQuery = "insert into test (x,y) values (?,'abc')", fmtParams = []}

Originally, I came across this when trying to set timestamps ('now') within an executeMany, so I guess this might be a common use case.

Chratho avatar Jun 06 '13 09:06 Chratho

This issue was hinted at in #61, although not explicitly spelled out. This is also somewhat related to #65.

In order for your code to work as is, postgresql-simple would need to better understand the SQL syntax of the queries it's formatting, which would also be a possible solution to #65. And as we probably should someday move to protocol-level parameters, I am curious how to deal with these issues in that context.

As a workaround, there are a couple of possibilities: you can set the default value for a time column to NOW(), and not explicitly set it in your insert. Alternatively, this should work:

> executeMany c "insert into test (x,y) values (?,?)" [(1, "now"),(2, "now")]

lpsmith avatar Jun 06 '13 11:06 lpsmith

Ok, I think issue #105 provided another example of things that should perhaps work, namely something like:

executeMany c "insert into foo (x,y) values (?,row(?,?))" [(1,"hello","bar"),(2,"world","baz")]

My (very tentative) proposal would be to find the VALUES keyword as today, find the opening paren after that keyword, and then find the matching paren to find the repeating group. That fragment of syntax would be repeatedly parameter-expanded, with interspersed commas. And we should probably recognize the standard sql string syntax and possibly also the quoted identifier syntax to avoid matching ?, (, and ) inside of literals written into the template itself. (Occurrences of these characters inside of string/identifier parameters wouldn't be an issue, of course.)

Although, notably, the current parameter expansion scheme doesn't recognize string and identifier literals, so one problem is that you wouldn't necessarily get consistent results between executeMany c q [row] and execute c q row. Of course this would only happen in cases that today are not "correctly" handled by execute and not handled at all by executeMany, but I suppose there is something to be said for being consistently more wrong over inconsistently less wrong. Also, I suppose it's entirely possible that somebody's abusing the current parameter expansion mechanism to do things like insert integers into literal strings. (e.g. execute c "INSERT INTO foo VALUES ('there are ? widgets')" (Only 123)) That's something to at least consider, even if I don't think that sort of behavior should really be considered part of postgresql-simple's interface, and wouldn't feel too bad about breaking that sort of code.

(Hrm, actually even today you don't always get consistent results between executeMany c q [row] and execute c q row, so maybe that's not worth fretting over too much.)

Whether or not this proposal is much good isn't clear to me at the moment.

lpsmith avatar Mar 30 '14 17:03 lpsmith

I want to insert millions of gemetry points in a table. The proposed query from PostGIS is following: insert into tablename (id,point) VALUES (?,ST_GeomFromEWKT('POINT(? ? ?)'))

This query works with the function: execute, but it does not work for executeMany. I found an ugly, really slow performing workaround: let q = insert into position (pos_player_id,pos_ingame_id, pos_coord, pos_game_lfd,pos_network_time,pos_message,pos_player_run,pos_game_id) values (?,?,ST_GeomFromEWKT('POINT(? ? ?)'),?,?,?,?,?)"
sequence_ $ map (execute conn q) points

Is there any possibility to get this query working with executeMany ? I appreciate any comments or advices.

juergenhah avatar Jun 22 '15 08:06 juergenhah

Well, if you are inserting millions of rows, you almost certainly do not want to use a single call to executeMany, as this means postgresql-simple is going to construct a single multi-megabyte query and send it to the backend all at once. (Not to mention, that postgresql-simple's generation of extremely large queries is not as efficient as it should be.)

So, one thing you can do is to chunk it up into multiple calls to execute or executeMany. You may find it beneficial to insert multiple rows at once; though if this number is more than a few, you don't want to repeat the ST_GeomFromEWKT('POINT(...)') syntax for every row. Consider something like insert into tablename (id, point) (select id, point(x,y,z) from values (?,?,?,?) foo(id,x,y,z)), where point is a proper PostGIS function that will turn three numerical arguments into a geometry. This latter query also has the advantage that it will work with executeMany.

If you are repeatedly calling execute or executeMany, you do want to use transactions appropriately for performance and correctness issues on a bulk insert. Alternatively, you can use COPY FROM STDIN, see the Copy module for a starting point. The main advantages of the copy interface is that it's painless streaming, and potentially even faster than the transactions and repeated calls I described above. The main disadvantages is that you'll have to generate the appropriate data format yourself, as postgresql-simple doesn't offer any functionality along those lines, and that the connection cannot be used for anything else until the copy operation completes. (For a starting point of how the data should look, you can COPY TO STDOUT some of the data from your table into the copy format you wish to use. You can use the psql shell's \copy command to do exactly this.)

Finally, you really don't want to write 'POINT(? ? ?)', the parameterization routines really should not be substituting those ? characters at all. It's a definite flaw in postgresql-simple, and I'm liable to break such code at some point in the future.

lpsmith avatar Jun 23 '15 15:06 lpsmith

I'm having issues with this too. Is there solution for this?

Table:

  CREATE TABLE conflicts (
      old uuid NOT NULL,
      new uuid NOT NULL,
      type text NOT NULL,
  );

My insert function (which cannot be made with SQL since it involves moving data around several databases

data Conflict a = Conflict
  { oldId :: !a
  , newId :: !a
  } deriving (Eq, Generic)

instance FromField a => FromRow (Conflict a)
instance ToField a   => ToRow (Conflict a)


saveAllConflicts :: HasPostgres m => [Conflict UUID] -> m ()
saveAllConflicts list = void $ executeMany [sql|
  INSERT INTO conflicts (old, new, type)
  VALUES (?, ?, 'soft')
|] list

And I get this error:

app-exe: FormatError {fmtMessage = "syntax error in multi-row template", fmtQuery = "INSERT INTO conflicts (old, new, type) VALUES (?, ?, 'soft')", fmtParams = []}

juanpaucar avatar Oct 10 '17 14:10 juanpaucar