postgresql-simple icon indicating copy to clipboard operation
postgresql-simple copied to clipboard

FromRow support for left outer joins

Open lpsmith opened this issue 12 years ago • 5 comments

Postgresql-simple can deal with joins in a reasonably nice way via the :. data type; e.g. if you have a website where users can post stories, you might have a schema that looks something like

CREATE TABLE users
   ( user_id   INT  NOT NULL
   , username  TEXT NOT NULL
   );

CREATE TABLE stories
   ( story_id  INT  NOT NULL
   , author    INT  NOT NULL REFERENCES users(user_id)
   , content   TEXT NOT NULL
   );

You could fetch all the (author, story) pairs via the following postgresql-simple code:

data Story = Story Int Int Text

instance FromRow Story where 
   fromRow = Story <$> field <*> field <*> field

data User = User Int Text

instance FromRow User where
   fromRow = User <$> field <*> field

getAuthors :: Connection -> IO [User :. Story]
getAuthors conn = do
    query_ conn [sql|  SELECT (u.*), (s.*) FROM 
                         users u JOIN stories s 
                           ON s.author = u.user_id |]

However, this will only return those users that have written at least one story. If you wanted to get all users and the stories they've written, regardless of whether or not that user has written a story, you could use a left outer join. The database will then fill the (normally non-nullable) story_id and content fields with nulls for that category of users.

Although left joins can be quite useful, they are an awkward construct to deal with in postgresql-simple. It would be nice if we could simply write:

getUsers :: Connection -> IO [User :. Maybe Story] 
getUsers conn = do
    query_ conn [sql| SELECT (u.*), (s.*) FROM
                         users u LEFT OUTER JOIN stories s 
                           ON s.author = u.user_id |]

However, this requires an instance FromRow a => FromRow (Maybe a) that would return Nothing if all the consumed fields are null. But it's impossible to write such an instance, as there's no way of knowing how many fields that a fromRow instance consumes. (And in fact it's possible to consume a variable number of fields, although the only practical example I know of is the existing instance FromField a => FromRow [a] that consumes all the remaining fields.)

One could implement a FromRow instance on specific Maybe types, e.g.

instance FromField (Maybe Story) where
    fromRow =  (null *> null *> null *> pure Nothing) <|> (Just <$> fromRow)
       where null = field :: RowParser Null

However, now you have an additional instance to keep in sync with the Story data type, not to mention that this requires the FlexibleInstances language extension, which isn't too problematic except that Haskell 98 instances support more polymorphism and therefore are preferable if they suffice.

So the question is, what can be done to make left outer joins easier to work with in postgresql-simple? How could we change and/or weaken the FromRow interface in order to support a generic, out-of-box instance FromRow a => FromRow (Maybe a)? Or is there another solution?

lpsmith avatar May 30 '13 20:05 lpsmith

What is the method for supporting left joins without using the FromRow typeclass? Also, the doc on hackage for 3.8 has not been generated as an fyi.

DiegoNolan avatar Oct 22 '13 23:10 DiegoNolan

You can drop Alternative instance from row parser, then it will know number of fields expected, so you can consume correct number of nulls. Personally I cant imagine when would I need variable length rows.

You can add something like FieldParser to wrap fromField, which can be Alternative instance. like:

    parseField :: (FromField a) => FieldParser a
    field :: (FromField a) => RowParser a -- as it is now
    field_ :: FieldParser a -> RowParser a -- lifts field parser to RowParser, needs better name
    data Foo = Foo (Either Int String) Whatnot
    instance FromRow User where
         fromRow = User <$>  field_ ((Left <$> parseField) <|> (Right <$> parseField))
                                  <*>  field

Edit: Hm.. what would parseField <*> parseField do here? One of parseField's will have to fail, which is awkward.

Another option is keeping max number of fields expected, and explicitly state in docs what if row is nullable, it will try to consume maximum number of nulls possible.

    data FooBar = Foo Int Int | Bar String
    fromRow = (Foo <$> field <*> field) <|> (Bar <$> field)

That would consume 2 fields of nulls if parsing Maybe FooBar.

sopvop avatar Oct 23 '13 07:10 sopvop

Just found out that this is 5 months old post, and I've misread what @DiegoNolan said. All of the above is for @lpsmith. Another thought - you may want to make newtype wrapper like Nullable a = Nullable { getNullable :: Maybe a} and provide 'max fields' FromRow for it. Then users are free to implement custom Maybe Foo FromRow instances.

sopvop avatar Oct 26 '13 18:10 sopvop

Ok, I meant to post something a couple of days ago, but apparently it didn't actually get posted.

A left outer join just produces a table like any other sql statement, and so you need to match types just like before. For example, this should work:

getUsers :: Connection -> IO [(Int, Text, Maybe Int, Maybe Int, Maybe Text)] 
getUsers conn = do
    query_ conn [sql| SELECT (u.*), (s.*) FROM
                         users u LEFT OUTER JOIN stories s 
                           ON s.author = u.user_id |]

Of course, this isn't a particularly compositional approach, and it's ugly that the columns corresponding to the story table are normally not null but might be null because of the left join. But because they are declared not null, every column will be either null or every column will have some data, which is information not captured by the return type.

Now, with the version of postgresql-simple I will release probably tonight or maybe tomorrow, you'll be able to write this instead:

getUsers :: Connection -> IO [(Int, Text) :. Maybe (Int, Int, Text)] 
getUsers conn = do
    query_ conn [sql| SELECT (u.*), (s.*) FROM
                         users u LEFT OUTER JOIN stories s 
                           ON s.author = u.user_id |]

This captures the information that either every column is null, or every column is not null. What this issue is complaining about is that it is impossible to write a instance FromRow a => FromRow (Maybe a) at the moment. I will have to change the FromRow interface in order to enable a generic instance.

lpsmith avatar Oct 27 '13 23:10 lpsmith

@sopvop, yeah the current interface is definitely too powerful. I do see some utility for being able to parse a variable number of rows, e.g. to convert a row into a dictionary from column names to values, not unlike PHP. But at the same time, I think it only makes sense that a row parser would treat each row the same. For example, it would make sense to parse the first 10 fields into a dictionary, and then the next 4 into some other structure. However it wouldn't then make sense (in my opinion) to do this on one row and then parse all 14 fields into a dictionary the next row.

However, ignoring more exotic use cases for the moment, I think one could implement the current interface as most people use it in practice, (as basically a positional system) as an indexed applicative functor with some type-level hackery, to be able to infer the number of rows consumed and the types expected. This would allow hoisting the type-checking portion to once per batch of rows instead of once per row, and allow a generic instance as I'd like.

lpsmith avatar Oct 27 '13 23:10 lpsmith