prql
prql copied to clipboard
Cool things in Malloy
I've been spending some time looking at Malloy, a language from Looker that looks promising. I've been a fan of @lloydtabb's for a while, and briefly chatted to him about Malloy a couple of weeks ago.
As ever with these "better SQL" languages — our fates are positively correlated. So I hope it's collegial to explore the things Malloy does particularly well at.
Without a full review of the differences — Malloy seems more focused on a broader vision of data modeling with structure on top of existing tables — like explore
s — blocks that combines tables — and dimensions
& measures
(these are things I always liked in Looker). The language integrates an approach for defining charts. There's less emphasis on pipelines — ordering isn't semantically important — though there is a way to pipe queries into other queries. It's more typing to get started.
Some things that look compelling at first glance:
- Inline filters:
compiles tony_flights is count() { where: origin.state = 'NY' }
I'm not sure how general these can be; presumably they can't be fullCOUNT( CASE WHEN origin_0.state='CA' THEN 1 END) as ca_flights
where
clauses like Malloy's normalwhere
clauses. - Small one —
or
syntax:where: carrier: 'UA' | 'AA'
- Nesting — similar to the discussion in our readme, this is executed quite well, with a
nest
block giving an array. So:
...produces a nested array (SQL is mangled so showing the result)query: airports -> { group_by: state aggregate: airport_count nest: by_facility is { group_by: fac_type aggregate: airport_count } }
[ { "state": "TX", "airport_count": 1845, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 1389 }, { "fac_type": "HELIPORT", "airport_count": 435 }, ...
- Joins —
https://looker-open-source.github.io/malloy/documentation/language/join.html
I rather like that it limits SQL's complexity — most joins in practice are left joins using primary keys. So this is nice:
This works because the primary keys has been defined above (similar to what we discussed on discord around defining relationships separately, reducing verbosity for inline joins):explore: flights is table('malloy-data.faa.flights') { join_one: origin_airport is airports with origin join_one: destination_airport is airports with destination join_one: aircraft with tail_num }
It is possible to do less standard joins as an escape hatch.explore: airports is table('malloy-data.faa.airports') { primary_key: code }
- Timestamps — small one — they prepend timestamps with
@
, like{ where: dep_time: @2003 to @2005 }
, and have attribute access with functions (which seem to be both prefix and suffix?)dep_time.month
/day_of_year(some_date)
That's all for now. I think it'd be good to keep in touch with the Malloy team and would welcome any cross-pollination.
Malloy have continued to add some great features, check out https://looker-open-source.github.io/malloy/documentation/language/expressions.html
@max-sixty I've spent a lot of time writing Kusto queries for Azure Data Explorer, and love the ability to tack on | chart ....
to get a visualization in supported gui clients. I also love that the visualization information is kept separate as metadata returned from Azure Data Explorer servers, so each client can do their own rendering. This kind of eliminates the need for some amount of Jupyter notebooks because you can easily visualize results inline with your query using standardized syntax.
Are you consider some kind of results visualization for PRQL or is that outside the scope of this project, given the architecture?
Thanks for the comment @brycefisher ! I agree that's really useful for some workflows. As well as Malloy, Juttle does this well.
I'm definitely open to including something like this. The ideal structure at the moment might be to have some way of extending PRQL, and let people build on top of it — maybe with importable functions, or a view
transform that was fairly unopinionated within PRQL.
To set expectations — at the moment I don't have a clear vision for what something good would look like, so it probably won't come from me without lots of people saying that PRQL's success is dependent on it — but that doesn't mean I think it's a bad idea!
How do you think about this?
Agreed! I wouldn't take one random persons request as an automatically good roadmap item.
I think time series analysis and easy ways of manipulating time in particular benefit from easy visualization. It probably makes sense to consider visualization related features as you mature time functions and time series.
Check this out: https://docs.google.com/presentation/d/18KUl_rrz2K-hbsiKJYS3rtTcYxZMXKklyPllLmTtIYY/mobilepresent#slide=id.g1269816dcbe_0_179
Some really cool stuff — it's has some excellent features around exploring data — in particular:
- the indexing of dimension values, so browsing the dataset shows info like top values
- the interaction from the results to the query — clicking a row of values can add those values as a filter in the query
- having a View — e.g. chart / json — within the query (similar to @brycefisher 's suggestion). I'm sure people will be divided on this, but some sort of basic charting can be v helpful when exploring, even if you don't want to always couple these
- while I found the interface for nesting confusing (requires "duplicating" the item, in the video), Malloy does lean heavily on nesting, and it seems good. As discussed elsewhere, we can do this easily with an
as | array
orto | array
(or maaaybe anest
rather than agroup
, if we want to make the shape of the data obvious to people reading) - Malloy's SQL generation seems to have improved a lot
I'm admittedly much more of a fan of the pipeline concept from a language POV — here I get confused as to whether the order of the clauses matters. Indeed, I'm not sure it could be explained simply, and e.g. adding a filter after an aggregate doesn't seem possible (because it doesn't yet support HAVING
as a different language item!).
But the browsing & exploring here is really great.
That is really great for data exploration.
It would be awesome if we could join forces and create a language agnostic tool for such exploration. Relational/dataframe model has been around for years and we are just inventing new ways for querying it, so I don't see why we couldn't reuse the UI that works with relational models and uses any of the 3 query languages.
Very much agree @aljazerzen .
I don't want to fall into the mindset of looking at different ideas and saying "nah PRQL better" — but in this case, the benefit in constructing the query arguably come from arbitrarily adding transforms — and this is the thing that PRQL is great at given its pipeline, and Malloy is less strong at. e.g. the having
example above. (Though SQL is terrible at, because adding an arbitrary transform can require a new CTE and query restructure.)
My vision for PRQL is that we would probably not build these sorts of things ourselves, but because we're an open and compatible language, we'd be the best tool for others building these sorts of interfaces...
The thing that Malloy starts on that PRQL doesn't is a semantic model for data — how the information maps to reality — e.g. what's a Dimension and what's a Measure, how is data related, how are metrics defined, etc. And that allows some cool features that contribute to Composer's strengths, like indexing dimensions and easier joins.