haskell-opaleye icon indicating copy to clipboard operation
haskell-opaleye copied to clipboard

Support for postgresql composite types

Open wraithm opened this issue 10 years ago • 34 comments
trafficstars

Postgresql supports composite types. Composite types have this special string-literal syntax for constructing them. Here's a little example:

CREATE TYPE point AS (
    x double precision,
    y double precision
);

CREATE TABLE points (p point);
INSERT INTO points VALUES('(2.0, 3.0)');

It would be awesome if opaleye supported these types.

Right now, I'm hacking this in with OtherLit and a manual FromField instance with a custom parser:

import Opaleye.Internal.HaskellDB.PrimQuery as HPQ
data PGPoint

pgPoint :: Point -> Column PGPoint
pgPoint (Point x y) = literalColumn . HPQ.OtherLit $
        "(" ++ show x ++ "," ++ show y ++ ")"

instance QueryRunnerColumnDefault PGPoint Point where
    queryRunnerColumnDefault = fieldQueryRunnerColumn

instance FromField Point where
     -- Custom parsing for the composite type goes here

I was thinking that an implementation of this could just add a CompositeLit constructor to the Literal type and a way of easily constructing FromField instances for composite types. For the CompositeLit part, I was thinking something like this:

data Literal = ... | CompositeLit1 Literal | CompositeLit2 (Literal, Literal) | ...

defaultSqlLiteral :: SqlGenerator -> Literal -> String
defaultSqlLiteral g (CompositeLit1 l) = defaultSqlLiteral g l
defaultSqlLiteral g (CompositeLit2 (l1, l2)) =
    "(" ++ defaultSqlLiteral g l1 ++ "," ++ defaultSqlLiteral g l2 ++ ")"
...

Another possible implementation would be just to provide functions for constructing OtherLit in the style of composite types and functions for FromField instances.

wraithm avatar Aug 14 '15 15:08 wraithm

CompositeLit [Literal] is probably fine, just like argument application to functions.

tomjaguarpaw avatar Aug 14 '15 15:08 tomjaguarpaw

Yeah, that was my first thought, but I guess I thought that explicitly making tuples would be more type-safe. Opaleye could just throw an error or something if the list is beyond a certain length or whatever could go wrong.

wraithm avatar Aug 14 '15 15:08 wraithm

You're right, it would be more typesafe. However I think it's quite hard to make an AST typesafe. The typesafety of the source syntax is enough, at least for now.

tomjaguarpaw avatar Aug 14 '15 15:08 tomjaguarpaw

Cool. Do you have any thoughts on how to make the FromField instances for composite types? Or is there another approach? Right now, my code has a bunch of custom parsers for each composite type. It'd be nice if there were some easy way to parse the composite types.

wraithm avatar Aug 14 '15 15:08 wraithm

How about you paste some of your FromField instances here and I'll see if anything strikes me.

tomjaguarpaw avatar Aug 14 '15 15:08 tomjaguarpaw

One of my actual use-cases is more like this (names are changed, but structurally similar to what I'm doing):

CREATE TYPE atype AS ENUM (
    'a', 
    'b'
);

CREATE TYPE a AS (
    a_type atype,
    a_hash numeric(49,0)
);

I use attoparsec for parsing.

data A = A Integer | B Integer

instance FromField A where
    fromField f Nothing = returnError ConversionFailed f "Nothing returned for A."
    fromField f (Just bs) = do
        tn <- typename f
        if tn == "a"
            then case parseOnly parseA bs of
                Left err -> returnError ConversionFailed f err
                Right a -> return a
            else returnError Incompatible f ("Wrong database type for A, saw: " ++ show tn)


aType :: Parser Bool
aType =
    (string "a" >> pure True) <|>
    (string "b" >> pure False)

parseA :: Parser A
parseA = do
    _ <- char '('
    ty <- aType
    _ <- char ','
    aHash <- decimal
    _ <- char ')'
    return $ if ty
        then A aHash
        else B aHash

wraithm avatar Aug 14 '15 16:08 wraithm

Hmm ... it would certainly be nice to get a uniform interface to these but I think it's more a question for Leon over at https://github.com/lpsmith/postgresql-simple/

tomjaguarpaw avatar Aug 14 '15 16:08 tomjaguarpaw

postgresql-simple totally supports composite types. Here's an example from some of my code from a few years ago:

CREATE TYPE zonedtime AS (
    "timestamp" timestamp with time zone,
    timezone timezone
);
instance FromRow TimeZone where
    fromRow = TimeZone <$> field <*> field <*> field

instance FromRow ZonedTime where
    fromRow = do
      utcTime <- field
      timeZone <- fromRow
      return . utcToZonedTime timeZone $ utcTime

Using the applicative instance of FromRow you can mix parsing field and fromRow.

lukehoersten avatar Aug 14 '15 16:08 lukehoersten

Yeah, so what's weird is that composite types use FromRow in postgresql-simple. You can nest a FromRow in another FromRow instance to get a composite type out. Opaleye only uses FromField.

wraithm avatar Aug 14 '15 16:08 wraithm

Right. Composite types are basically just "Row Types" and that's how postgresql-simple thinks of them.

lukehoersten avatar Aug 14 '15 16:08 lukehoersten

Ah, I see more clearly what the problem is now: it's easy to create FromRow instances but not FromField.

The reason that Opaleye requires FromField is that we need to be able to make any column type Nullable. I don't believe you can do that with a FromRow as it contains multiple fields in general. Does anyone know how nullability works with composite types? Does anyone know why it's not as easy to combine FromField instances as it is to combine FromRow instances?

tomjaguarpaw avatar Aug 14 '15 16:08 tomjaguarpaw

Composite type fields are all nullable and that can't be changed:

"The [CREATE TYPE] syntax is comparable to CREATE TABLE, except that only field names and types can be specified; no constraints (such as NOT NULL) can presently be included."

See: http://www.postgresql.org/docs/9.4/static/rowtypes.html

lukehoersten avatar Aug 14 '15 16:08 lukehoersten

So the fields within a composite type are nullable, but the whole value cannot be made NULL?

In more familiar terminology we are in the realm of (Maybe a, Maybe b) as opposed to Maybe (Maybe a, Maybe b)?

tomjaguarpaw avatar Aug 14 '15 16:08 tomjaguarpaw

Actually they're always NOT NULL. My last comment was wrong. So the whole composite type is either null or not null together.

lukehoersten avatar Aug 14 '15 16:08 lukehoersten

So, all fields in a composite type are not nullable, but whether the composite type is nullable is up to the table.

CREATE TABLE points (p point NOT NULL);
INSERT INTO points VALUES(NULL);
ERROR:  null value in column "p" violates not-null constraint
DETAIL:  Failing row contains (null).

All fields in a type are not null:

# insert into points values (point(1.0,2.0));
INSERT 0 1
# insert into points values (point(1.0,NULL));
ERROR:  null value in column "p" violates not-null constraint
DETAIL:  Failing row contains (null).

wraithm avatar Aug 14 '15 16:08 wraithm

Hmm, so how is the NULL case parsed by postgresql-simple?

tomjaguarpaw avatar Aug 14 '15 16:08 tomjaguarpaw

postgresql-simple doesn't do anything special for NULLs. They're handled in the FromField and FromRow implementations and will usually return a Nothing or empty list. It's left up to the higher-level implementation to decide.

As far as our discussion has gone here it sounds like composite type (row type) handling will solely rest on Opaleye's ability to represent row types as fields. @lpsmith is pretty active on IRC. It might be worth bouncing ideas of him to see what he thinks.

lukehoersten avatar Aug 14 '15 18:08 lukehoersten

My prior understanding was that you couldn't make a FromRow nullable, only a FromField, but I'll have another look.

tomjaguarpaw avatar Aug 14 '15 18:08 tomjaguarpaw

@tomjaguarpaw you're correct. What I mean is the table containing the row type would handle nulling the field. For example:

instance FromRow ZonedTime where
    fromRow = do
      utcTime <- field
      timeZone <- fromRow
      return . utcToZonedTime timeZone $ utcTime

instance FromRow (Maybe ZonedTime) where
    fromRow = do
      utcTime <- field
      timeZone <- fromRow
      return $ utcToZonedTime <$> timeZone <*> utcTime

The first row type instance handles the non-nullable case and the second row type instance handles the case where the row type is nullable in the containing table. Here's what the containing table implementation looks like:

instance FromRow MyType where
  fromRow =
    MyType
    <$> _myId
    <*> _someField
    <*> _foo
    <*> _time1
    <*> _time2
    <*> _time3
    where
      _myId         = field
      _someField    = field
      _foo          = field
      _time1        = fromRow
      _time2        = fromRow
      _time3        = fromRow

The existence of (Maybe ZonedTime) vs. (ZonedTime) in the MyType handles the nullable vs. non-nullable case respectively.

lukehoersten avatar Aug 14 '15 19:08 lukehoersten

data QueryRunnerColumn pgType haskellType =
  QueryRunnerColumn (U.Unpackspec (Column pgType) ()) (FieldParser haskellType)

FieldParser is built into the column query type. Perhaps it needs to be either FieldParser or RowParser because composite types are row types, ie, a row that goes into a column.

wraithm avatar Aug 14 '15 19:08 wraithm

I'm setting up a test with this. Maybe it's way off, but it's a thought. This will break for nullable fields with a row-type (because of the implementation of queryRunnerColumnNullable).

data QueryRunnerColumn pgType haskellType 
  = QueryRunnerColumn (U.Unpackspec (Column pgType) ()) (FieldParser haskellType)
  | QueryRunnerComposite (U.Unpackspec (Column pgType) ()) (RowParser haskellType)

rowQueryRunnerColumn :: FromRow haskell => QueryRunnerColumn coltype haskell
rowQueryRunnerColumn =
  QueryRunnerComposite (P.rmap (const ()) U.unpackspecColumn) fromRow

queryRunner :: QueryRunnerColumn a b -> QueryRunner (Column a) b
queryRunner (QueryRunnerColumn u fp) = QueryRunner u (const (fieldWith fp)) (const True)
queryRunner (QueryRunnerComposite u rp) = QueryRunner u (const rp) (const True)

wraithm avatar Aug 14 '15 20:08 wraithm

@WraithM If you can get it to work with RowParser then just get rid of FieldParser entirely. I couldn't work out how to make RowParsers nullable though.

tomjaguarpaw avatar Aug 14 '15 21:08 tomjaguarpaw

I'm not sure exactly what @LukeHoersten is doing, as I can think of a couple of possibilities, but postgresql-simple does not fully support composite types out of box, though it would be nice to remedy this. There is support for obtaining the type structure of a composite type, but there are not yet any parsers for the syntax.

And, actually composite types have been a topic of two recent issues, lpsmith/postgresql-simple#153 and lpsmith/postgresql-simple#159, both of which uncovered shortcomings in postgresql-simple and resulted in patches now applied to the master branch.

lpsmith avatar Aug 15 '15 02:08 lpsmith

@lpsmith I'm definitely using composite types with PostgreSQL-simple. Queries don't use any special syntax beyond tuple notation. What are you thinking is not supported?

lukehoersten avatar Aug 15 '15 13:08 lukehoersten

I'm talking about retrieving composite types from postgresql, which is possible in postgresql-simple if you write your own FromField parser for your composite type. What I'm saying is that whatever you are doing, I don't think you have composite types being returned in a single column, like this:

lpsmith=> select id, name from fruits;
 id |   name    
----+-----------
  0 | Apple
  2 | Orange
  3 | Banana
  4 | Pineapple
  1 | Pear
(5 rows)

lpsmith=> select (id, name) from fruits;
      row      
---------------
 (0,Apple)
 (2,Orange)
 (3,Banana)
 (4,Pineapple)
 (1,Pear)
(5 rows)

Now, in some cases, like this one, you can have the backend flatten out the results for you:

lpsmith=> select (tbl.row).* from (select (id,name) :: fruits from fruits) tbl;
 id |   name    
----+-----------
  0 | Apple
  2 | Orange
  3 | Banana
  4 | Pineapple
  1 | Pear
(5 rows)

lpsmith avatar Aug 15 '15 18:08 lpsmith

@lpsmith: it looks like I'm using a variant of the "backend flattening method" you mentioned. Here's my SQL snippet:

      q = [sql| SELECT id, 
                (r.start_time).timestamp, (r.start_time).timezone.utc_offset_minutes, (r.start_time).timezone.is_dst, (r.start_time).timezone.name,
                (r.end_time).timestamp, (r.end_time).timezone.utc_offset_minutes, (r.end_time).timezone.is_dst, (r.end_time).timezone.name,
                FROM my_table
                WHERE id = ?
            |]

This leads me to two questions:

  1. @lpsmith "in some cases" implies this doesn't work in all cases. What cases does backend flattening not work?
  2. @tomjaguarpaw is backend flattening suitable for Opaleye?

Also, just to be clear, if we have a custom type it'll always be on the user to write the To/FromField or Row, correct? This won't be generated in any way by Opaleye?

lukehoersten avatar Aug 17 '15 13:08 lukehoersten

The backend flattening method would fit well into Opaleye. The notion of "backend flattening" can be encoded in a QueryRunner. It might take a bit of work to derive it automatically, but to code it by hand should be easy.

tomjaguarpaw avatar Aug 17 '15 13:08 tomjaguarpaw

@tomjaguarpaw great! I suppose we can make changes to use the "row in a single field" approach if/when that gets added to postgresql-simple in the future if applicable.

lukehoersten avatar Aug 17 '15 13:08 lukehoersten

if we have a custom type it'll always be on the user to write the To/FromField or Row, correct? This won't be generated in any way by Opaleye?

Yes, at the moment Opaleye doesn't generate any To/FromField or Row instances. It just uses the existing ones.

tomjaguarpaw avatar Aug 17 '15 13:08 tomjaguarpaw

@LukeHoersten , trying to avoid composite types in results altogether can get rather more ugly when you are dealing with arrays of composite types, which both of the issues I linked to are dealing with. You could turn one row into multiple rows, which duplicates the fields that are not in the array. Or perhaps you could get the backend to send you multiple columns of arrays of "scalars", instead of a single column of an array of composites.

Either way, this is often a bit more work for the backend, of the type that almost certainly doesn't benefit the backend. (And in the case of sending redundant data, almost certainly hurts the backend.) It's also potentially more work on the client, assuming the array of composites form is what's really desired.

lpsmith avatar Aug 17 '15 17:08 lpsmith