FromRow support for left outer joins
Postgresql-simple can deal with joins in a reasonably nice way via the :. data type; e.g. if you have a website where users can post stories, you might have a schema that looks something like
CREATE TABLE users
( user_id INT NOT NULL
, username TEXT NOT NULL
);
CREATE TABLE stories
( story_id INT NOT NULL
, author INT NOT NULL REFERENCES users(user_id)
, content TEXT NOT NULL
);
You could fetch all the (author, story) pairs via the following postgresql-simple code:
data Story = Story Int Int Text
instance FromRow Story where
fromRow = Story <$> field <*> field <*> field
data User = User Int Text
instance FromRow User where
fromRow = User <$> field <*> field
getAuthors :: Connection -> IO [User :. Story]
getAuthors conn = do
query_ conn [sql| SELECT (u.*), (s.*) FROM
users u JOIN stories s
ON s.author = u.user_id |]
However, this will only return those users that have written at least one story. If you wanted to get all users and the stories they've written, regardless of whether or not that user has written a story, you could use a left outer join. The database will then fill the (normally non-nullable) story_id and content fields with nulls for that category of users.
Although left joins can be quite useful, they are an awkward construct to deal with in postgresql-simple. It would be nice if we could simply write:
getUsers :: Connection -> IO [User :. Maybe Story]
getUsers conn = do
query_ conn [sql| SELECT (u.*), (s.*) FROM
users u LEFT OUTER JOIN stories s
ON s.author = u.user_id |]
However, this requires an instance FromRow a => FromRow (Maybe a) that would return Nothing if all the consumed fields are null. But it's impossible to write such an instance, as there's no way of knowing how many fields that a fromRow instance consumes. (And in fact it's possible to consume a variable number of fields, although the only practical example I know of is the existing instance FromField a => FromRow [a] that consumes all the remaining fields.)
One could implement a FromRow instance on specific Maybe types, e.g.
instance FromField (Maybe Story) where
fromRow = (null *> null *> null *> pure Nothing) <|> (Just <$> fromRow)
where null = field :: RowParser Null
However, now you have an additional instance to keep in sync with the Story data type, not to mention that this requires the FlexibleInstances language extension, which isn't too problematic except that Haskell 98 instances support more polymorphism and therefore are preferable if they suffice.
So the question is, what can be done to make left outer joins easier to work with in postgresql-simple? How could we change and/or weaken the FromRow interface in order to support a generic, out-of-box instance FromRow a => FromRow (Maybe a)? Or is there another solution?
What is the method for supporting left joins without using the FromRow typeclass? Also, the doc on hackage for 3.8 has not been generated as an fyi.
You can drop Alternative instance from row parser, then it will know number of fields expected, so you can consume correct number of nulls. Personally I cant imagine when would I need variable length rows.
You can add something like FieldParser to wrap fromField, which can be Alternative instance. like:
parseField :: (FromField a) => FieldParser a
field :: (FromField a) => RowParser a -- as it is now
field_ :: FieldParser a -> RowParser a -- lifts field parser to RowParser, needs better name
data Foo = Foo (Either Int String) Whatnot
instance FromRow User where
fromRow = User <$> field_ ((Left <$> parseField) <|> (Right <$> parseField))
<*> field
Edit: Hm.. what would parseField <*> parseField do here? One of parseField's will have to fail, which is awkward.
Another option is keeping max number of fields expected, and explicitly state in docs what if row is nullable, it will try to consume maximum number of nulls possible.
data FooBar = Foo Int Int | Bar String
fromRow = (Foo <$> field <*> field) <|> (Bar <$> field)
That would consume 2 fields of nulls if parsing Maybe FooBar.
Just found out that this is 5 months old post, and I've misread what @DiegoNolan said. All of the above is for @lpsmith.
Another thought - you may want to make newtype wrapper like Nullable a = Nullable { getNullable :: Maybe a} and provide 'max fields' FromRow for it. Then users are free to implement custom Maybe Foo FromRow instances.
Ok, I meant to post something a couple of days ago, but apparently it didn't actually get posted.
A left outer join just produces a table like any other sql statement, and so you need to match types just like before. For example, this should work:
getUsers :: Connection -> IO [(Int, Text, Maybe Int, Maybe Int, Maybe Text)]
getUsers conn = do
query_ conn [sql| SELECT (u.*), (s.*) FROM
users u LEFT OUTER JOIN stories s
ON s.author = u.user_id |]
Of course, this isn't a particularly compositional approach, and it's ugly that the columns corresponding to the story table are normally not null but might be null because of the left join. But because they are declared not null, every column will be either null or every column will have some data, which is information not captured by the return type.
Now, with the version of postgresql-simple I will release probably tonight or maybe tomorrow, you'll be able to write this instead:
getUsers :: Connection -> IO [(Int, Text) :. Maybe (Int, Int, Text)]
getUsers conn = do
query_ conn [sql| SELECT (u.*), (s.*) FROM
users u LEFT OUTER JOIN stories s
ON s.author = u.user_id |]
This captures the information that either every column is null, or every column is not null. What this issue is complaining about is that it is impossible to write a instance FromRow a => FromRow (Maybe a) at the moment. I will have to change the FromRow interface in order to enable a generic instance.
@sopvop, yeah the current interface is definitely too powerful. I do see some utility for being able to parse a variable number of rows, e.g. to convert a row into a dictionary from column names to values, not unlike PHP. But at the same time, I think it only makes sense that a row parser would treat each row the same. For example, it would make sense to parse the first 10 fields into a dictionary, and then the next 4 into some other structure. However it wouldn't then make sense (in my opinion) to do this on one row and then parse all 14 fields into a dictionary the next row.
However, ignoring more exotic use cases for the moment, I think one could implement the current interface as most people use it in practice, (as basically a positional system) as an indexed applicative functor with some type-level hackery, to be able to infer the number of rows consumed and the types expected. This would allow hoisting the type-checking portion to once per batch of rows instead of once per row, and allow a generic instance as I'd like.