postgresql-simple Handling arrays of custom types

I am interfacing with a database which is denormalized so that a column contains an array of foreign keys pointing to another table. Something like this:

create table baskets (
  id integer,
  contents integer[]
);

create table fruits (
  id integer,
  name text
);

The query to retrieve the baskets together with their contents:

select
  baskets.id,
  array_agg((fruits.id, fruits.name))
from
  baskets left join fruits on fruits.id = any(baskets.contents)
group by
  baskets.id;

This query returns rows in which the second column is an array of (integer, text) tuples (of the record type in PostgreSQL).

The expected representation on the Haskell side is:

data Basket = Basket { basketId :: Integer, basketContents :: [Fruit] }
data Fruit = Fruit { fruitId :: Integer, fruitName :: Text }

I have trouble coming up with an instance for Basket.

instance FromRow Basket where
    fromRow = Basket <$> field
                     <*> liftM ((map (uncurry Fruit)) . fromPGArray) field

fails as there is no instance for FromField (Integer, Text).

Can record types be handled generically in this case, possibly utilising a FromRow instance?

Jul 06 '15 09:07 nmk

There's only minimal support for record types in postgresql-simple proper at the moment; namely see the typeinfo module. And I'm not sure if adding an instance FromField (Integer, Text) is a good idea, given the FromRow instance. I'd be more comfortable with a newtype.

One thing I don't understand is that postgresql, as far as I know, only deals with types of kind * and every such type needs a name and type oid. So I am curious about how this construct works as far as reporting the type to the client, given that I'm assuming that you never explicitly created a (Integer, Text) record type.

Then you have the parsing issue; this shouldn't be too difficult, but one thing I'm a little bit afraid of is that some values will have special quoting which the basic parsers won't understand. So you would either need a special parser for some types, such as text, or if you want to use the existing parsers, you'd have to construct a new unquoted bytestring at least in some cases, which probably isn't too efficient.

This is a feature that I would like to support, but I haven't yet had an overwhelming need for it myself. So any research and/or implementation effort you'd like to do on this issue would be most welcome.

Jul 10 '15 02:07 lpsmith

Unless I'm missing something, it looks like postgresql is relatively untyped in this regard. The type is a _record which doesn't appear to report the substructure at all:

{-# LANGUAGE OverloadedStrings, QuasiQuotes #-}

import Data.ByteString
import Database.PostgreSQL.Simple
import Database.PostgreSQL.Simple.FromField
import Database.PostgreSQL.Simple.Types
import Database.PostgreSQL.Simple.SqlQQ

{--

create table baskets ( id integer,  contents integer[] );

create table fruits  ( id integer,  name text );

insert into fruits values 
  (0,'Apple'),
  (1,'Peach'),
  (2,'Orange'),
  (3,'Banana'),
  (4,'Pineapple');

insert into baskets values 
  (1,ARRAY[0,1,2]),
  (2,ARRAY[2,3,4]),
  (3,ARRAY[0,2,4]);

--}

instance FromField TypeInfo where
    fromField f _mval = typeInfo f

main = do
  c <- connectPostgreSQL ""
  xs <- query_ c [sql|
           select
             baskets.id,
             array_agg((fruits.id, fruits.name))
           from
             baskets left join fruits on fruits.id = any(baskets.contents)
           group by
             baskets.id
           limit 1
         |]
  print (xs :: [(TypeInfo, TypeInfo)])
  close c

The above code prints the following result:

[(Basic {typoid = Oid   23, typcategory = 'N', typdelim = ',', typname = "int4"},
  Basic {typoid = Oid 2287, typcategory = 'P', typdelim = ',', typname = "_record"})]

Which is a little disconcerting, as the TypeInfo record in this instance should probably be an Array, but the typeinfo system seems to be getting this wrong.

Jul 10 '15 03:07 lpsmith

Well, there are some other corner cases in the typeinfo system that perhaps aren't handled correctly either. Here's a query that lists types that need to be considered a bit more:

with types as (select oid,* from pg_type where typelem <> 0 and typcategory <> 'A') 
   (select * from types) union
   (select oid,* from pg_type where oid in (select typelem from types)) order by oid;

Jul 10 '15 03:07 lpsmith

Ok, again, the parsing issues should be doable, even if the easiest solution given the current interfaces would not be as efficient as I'd like. And as far as I can tell, we can't really do run time type checking of the substructure in this specific case. Rather, any type error would likely be reported as a conversion error instead. (Also, this would allow e.g. postgresql's int8 read into haskell's Int32 if the value happens to be small enough.)

So maybe as a first sketch of an idea, we could have something like this:

newtype Record a = Record a
instance FromField a, FromField b => FromField (Record (a,b)) where ...
instance FromField a, FromField b, FromField c => FromField (Record (a,b,c)) where ...
...

We could also make this compatible with postgresql's composite types, in which case we can also do run-time type checking of the substructure.

What would be even better, though, is to modify the FromRow interface so that we have static access to the number and type of results the instance consumes. This would allow us to hoist run-time type checking to once per set of results, instead of on each and every row, and allow us to more easily write a more generic instance:

instance FromRow a => FromField (Record a) ...

Jul 10 '15 15:07 lpsmith

Thanks for looking into this. I looked at the FromField instance of PGArray and I think I understand neither enough of the Haskell nor enough of the PostgreSQL part to contribute anything myself. It seems like someone else is looking for something similar in #156 though.

Jul 17 '15 16:07 nmk

I've been looking into writing these FromField instances, but I'm getting stuck writing the parser. It should be a matter of parsing an opening parenthesis, parsing the first value, parsing a comma, parsing the second value, then a closing parenthesis. But where do I get the parsers for the subexpressions? All I have are FieldParsers. Can anyone point me in the right direction?

May 09 '16 09:05 hesselink

Well, if you are going to use the FromField parsers, you also have to remove a level of quoting. You might take a look at the FromField instance for arrays.

So a FieldParser is a type synonym for a function that takes a Field type and a Maybe ByteString. The Field type which is treated as abstract for most purposes but is actually a (result, column number, type oid) triple; essentially, a pointer and two ints. You can access the Field data constructor through the Internal module, which I think you'll have to use if you want a instance that uses other FromField instances, as you'll have to reassign the type oid. Err, except postgresql doesn't report type oids to assign in case of a record type, which is a significant conundrum.

So on further reflection, it would seem that while you could create a compositional record parser in the case of an explicitly created composite type, but that the current postgresql-simple conversion machinery really isn't suited to handling the "dynamic" record type. For the time being, it would seem that you would have to bypass the current FromField infrastructure for parsing records, in that you could create an instance that parses a given record, but that parser wouldn't be using other FromField instances and could only parse some subset of records. (Though I suppose you could return the record as a list or vector of Maybe ByteStrings, which would apply to all records, leaving resposibility for further conversion to the user and/or a FromRow instance.)

(The FromRow/FromField regime desperately needs a careful rethinking anyways to solve quite a few different messes, but I haven't gotten very far with that redesign.)

But even in with an explicitly created composite type, it's still a little disconcerting that parsing _records (arrays of records) means two unquoting passes before you get to parse the component values. That can't be very efficient.

May 09 '16 14:05 lpsmith

Ah, I've found the array parsing code, that will help. The documentation for composite type output looks pretty similar to the array output. That should be a lot of help. Regarding the inefficiency, I think postgres leaves off levels of quoting if they're not necessary, and I'm guessing in many cases query times will dwarf result parsing times anyway.

That all works only if we can get to the types of course, which you indicate we can't. I really don't see a way around it; I was thinking about getting it from the Haskell type, but that's a one to many mapping, so that doesn't work.

I've moved to using two arrays instead of an array of tuples for now. That means there's a hidden invariant that they are the same length and some post processing, but it works for now. Since there seems to be no way forward here, I'll leave it at that unless you have a suggestion on how to proceed here.

May 09 '16 15:05 hesselink

Ok, so if the previous comment isn't very clear, as I'm writing this as I'm figuring out some of the issues and thus don't know exactly where I'm going to end up, here's what I think is possible at the present time:

newtype UntypedRecord = UntypedRecord [Maybe ByteString]

instance FromField UntypedRecord where
    fromField = {- something compatible with "record" and, optionally, all composite types -}

newtype  DynRecord = DynRecord [(Oid, Maybe ByteString)]

instance FromField DynRecord where
   fromField = {- something compatible with composite types,  but not "record" -}

If you'd like, DynRecord could also include the field names which are reported as part of the TypeInfo... though UntypedRecord could insert default field names (which IIRC, is basically what postgres does as well.)

So far, this is pretty fundamental, and this observation should apply to all possible revisions to the current conversion regime.

However, of course, what you really want is something like:

newtype  Record a = Record a

instance FromRow a => FromField (Record a)

So, at the present, this isn't possible in the UntypedRecord case, though you could certainly write something like this that composes DynRecord with other FromField instances.

If we tweaked postgresql-simple's conversion API, then you could have this for both UntypedRecord and DynRecord, but of course you won't have full typechecking in the former case.

However, what would be possible at the present time is something like the following that would handle the specific case originally given:

data MyRecord = MyRecord Int Text

instance FromField MyRecord where
   fromField = {- possibly compatible with both a subset of "record" an any (Int, Text) composite type -}

This could even use the FromField instances for Int and Text, because we know what types we are expecting, we can simply set the type oids appropriately. Thus, if we use this on record, we'd never get a type error exception, but we might get a conversion exception if the types aren't compatible.

May 09 '16 15:05 lpsmith

Another possibility, which would be a bit of a band-aid fix to this particular mess, is we could set the type oid to the type oid of unknown, and make all the default conversions compatible with unknown. That would be the most sensible way to turn a record into a DynRecord, and would allow for the Record newtype to work on record. That's sorta what postgres would do, I think.

May 09 '16 16:05 lpsmith