Allow to retrieve column names from query results
Currently, I see no way of retrieving the names of the columns of the data that a query returns. This could be very useful at times. I would like something like this:
type Header = String
query :: (ToRow q, FromRow r) =>
Connection -> Query -> q -> IO ([Header], [r])
Or am I missing something? Is there an easy way to accomplish this?
This is already doable: use the name operator in the FromField module. Arguably, however, this isn't all that convenient as you have to write a custom FromField instance. (or, if you use fieldWith, you can fold it into a FromRow instance.)
I find myself doing this often enough to consider adding a datatype specifically for this task: e.g. this should almost work:
data FieldName a = FieldName !ByteString a
instance FromField a => FromField FieldName a where
fromField = FieldName <$> name <*> field
Yes, that works. If you want to access values from a row by fieldname, that looks like a perfect solution to me. Sometimes, however, I just want to have the headers apart from the data. For example to dump a query result to csv. In this case I find your solution neither elegant nor convenient (as you already pointed out).
Also, I wonder about performance. As I understand, your solution will work with negligable performance hit if the parsing of values (using 'fromField') is implemented carefully with this use in mind. I'm afraid that otherwise the construction of fieldnames will be done for every row.
Also, there is one usecase, that your solution does not solve: What if I want to get the fieldnames of a query that does return 0 rows?
All in all I would think some other way to get fieldnames of a query would be desirable. Would it be possible to have something like:
queryHeaders :: Connection -> Query -> [String]
? (IIRC the C-API of postgresql allowed to do this without actually executing the query. I might be wrong...)
Hmm, well it's not exactly ideal if you want to access fields by name either. But otherwise, you do make a lot of good points.
The issues with the current FromField/FromRow/ToField/ToRow and quotation interface seems to keep piling up, just look at the other open issues. Similarly, I'd like to be able to hoist type-checking to once per LibPQ.Result instead of once per row.
queryHeaders seems not generally useful enough for me to be enthusiastic about including it with postgresql-simple. You may be right that it's possible to get the column names without actually executing the query using libpq, but I don't know off the top of my head. It would, of course, still require a round-trip between the client and the postgres server.
In any case, you can certainly implement queryHeaders outside of postgresql-simple using the Simple.Internal module and postgresql-libpq. I'm not keen on the function itself, but I would like to see the implementation technique you come up with.
Also, if all you want to do is a straightforward dump of a query to CSV, the Copy module can do this, probably with less code than generating the CSV yourself. And it has the advantage of supporting streaming a bit more straighforwardly than e.g. using a cursor (which the fold function uses underneath.)
There are options to dump the column names as well. However if you want to tweak the results on the way out of the database in Haskell, then it's probably better to stick with the query interface.
Thanks for all your input.
You say the issues with the current design keep piling up: Do you plan to do a major revision of the API?
I'll look into implementing queryHeaders myself, if I find the time. Thanks for pointing to postgreql-libpq, I was not aware it existed.
COPY is not an option for us since we are using redshift (http://en.wikipedia.org/wiki/Amazon_Redshift) and it does not support COPY (or UNLOAD how they call it) for local files.
Yes, I hope to issue a major revision sometime, but I don't yet know how to tweak the interfaces yet to solve the various issues, so I really have no idea about a possible timetable here.
Actually, come to think of it, there is a way to efficiently get the column names of a table without dropping down to libpq. It's a little hacky, and not as efficient as it could be, but it should work pretty reliably:
First we need a getTypeOid function:
newtype ExtractTypeOid = ExtractTypeOid Oid
instance DB.FromField ExtractTypeOid where
fromField f _ = return $ ExtractTypeOid $ DB.typeOid f
-- | Retreive the 'Oid' corresponding to a type. You can subsequently
-- use the 'Oid' to call 'getTypeInfo' for more information on the
-- type.
getTypeOid :: DB.Connection -> B.ByteString -> IO Oid
getTypeOid c tname = do
[Only (ExtractTypeOid ti)] <- DB.query_ c $ Query $ "SELECT NULL :: " <> tname
return ti
Thanks to @alevy for this. Notably, this works for table names as well, as every table has an associated structure type and type oid. Of course, this is hideously vulnerable to sql injection as-is.
Then, simply call getTypeInfo to get the TypeInfo structure associated with the table, which includes the column names. The whole process takes two round trips, but the TypeInfo structure is cached for better and worse.
Unfortunately, this doesn't work for general queries as you asked.
Another possibility, which is also hacky but should work. I haven't verified it though.
First, issue your query with an extra WHERE FALSE clause. You could even write (select * from (query text) as foo where false) to make it easily compositional. You should get back a zero-row result with all the column names.
Your first proposal is interesting, but as you said doesn't work generally for queries but just for simple table queries.
But I fail to understand your last comment. Wouldn't a zero-row result just be the empty list? How can I retrieve the column names then?
Ahh, good point. The LibPQ.Result structure should still have all the column names in it, but there isn't any good way at getting at them from postgresql-simple without again mucking around with the Internal module.