prql icon indicating copy to clipboard operation
prql copied to clipboard

Mutation queries (DML)

Open aljazerzen opened this issue 2 years ago • 8 comments

Ref #968

Right now, PRQL (which stands for something something QUERY language) does not support any DML commands (DML stands for Data Manipulation Language). Do we want it to support INSERT, UPDATE & DELETE?

I'm opening this issue mostly as a tracking/planning issue, as we don't have the capacity to work on this in the near future.

Let's start with observation that there are use-cases that need DML even in analytical SQL. ETL where Transform happens in the database may need to store results in an actual table, using INSERT. Or preprocessing some dataset may include doing some kind of complex query that determines which rows are unusable and must be DELETEd.

So I do think that we need DML, but must be careful on how to integrate these statements into a pipelined language.

aljazerzen avatar Oct 28 '22 14:10 aljazerzen

As ever, I'm keen to prioritize whatever people need to make PRQL more helpful for folks. While my initial inclination is to downweigh DML, I'm open to ideas, and one voice among many.

I can see two broad needs for this:

  1. Writing the results of a query. I'm not sure how this should be split between PRQL vs. its tool (e.g. prql-query, dbt, etc); much of my experience is with dbt, which does this already, but possibly this is too narrow?

    • Either way, I can see the logic to having this, and to the extent we want a final transform like | write foo_table, that seems reasonable and unobtrusive
  2. More complex DML, such as creating temp tables, mutating data, etc, not just at the end of a query

    My impulse is that this is much more complex, loses some of the advantages of pipelines, such as referential transparency in the case of mutations.

    Let's start with observation that there are use-cases that need DML even in analytical SQL. ETL where Transform happens in the database may need to store results in an actual table, using INSERT. Or preprocessing some dataset may include doing some kind of complex query that determines which rows are unusable and must be DELETEd.

    Are there common cases of this in modern data warehouses? My experience — albeit partial at best — is that this is rare for most operations (I'm looking for common cases, not super-advanced cases)

    My sense is that this is an "extra credit" question — one way of observing how mature PRQL is to ask whether it can handle advanced workflows like this. But these can also be traps for our prioritization — we're not that good at the basic operations yet! — and so trying to do the advanced ones can leave us mediocre everywhere.

    So I would prioritize things like types and integrations above this.

But as above, open-minded, and let's anchor around the cases.

max-sixty avatar Nov 01 '22 03:11 max-sixty

I agree with @max-sixty .

Point 1. seems like a small addition which could be relatively straight forward and I think is worth adding.

Point 2. to me seems more complex and more at odds with the initial aims of PRQL and so I would downprioritise quite far down the list (possibly as far as never).

snth avatar Nov 03 '22 20:11 snth

my initial inclination is to downweigh DML

I'm on that bandwagon as well. I think that the lack of support for DML is a feature (and actually one of the points that make PRQL interesting for my use case). Now, assuming DML support is optional, e.g. as a language extension, I think it could actually be a good thing.

rolinh avatar Nov 16 '22 11:11 rolinh

Why do you think that lack of DML is a feature?

aljazerzen avatar Nov 16 '22 15:11 aljazerzen

For my use-case, I want to allow technical but non-db people to query data (e.g. security engineers). The only part of SQL that is allowed is read-only queries (ie SELECT). In this context, users have no need for DDL, DML or DCL and that means that receiving permission errors whenever an "unsupported" statement is used might be surprising to users who potentially expect for the whole PRQL language to be supported.

For a bit more context, at some point in my career I worked with security engineers for which the Kusto Query Language (KQL) was really easy to adopt and solved their use-cases. I think that PRQL in its current state is (much) better than Kusto and widely more portable.

rolinh avatar Nov 17 '22 14:11 rolinh

I see, it really is a feature. We would at least need to have an opt-in for writable tables and compile with "table not writable" by default. Or include DML via extension, as you suggested.

And great to hear about Kusto!

aljazerzen avatar Nov 17 '22 15:11 aljazerzen

I want to allow technical but non-db people to query data (e.g. security engineers). The only part of SQL that is allowed is read-only queries (ie SELECT)

💯 – like @rolinh said, limiting users to SELECT only statements is super handy because as a vendor it let's users interact with an underlying database without the fear of changing things, or worst, malicious activity.

I highly suggest that if DML is on the cards it should be supported in an opt in way. In fact I've shared a suggestion in https://github.com/PRQL/prql/issues/1340 for a granular control of PRQL features because this feels well within the scope of use cases in which PRQL can/should be used.

armanm avatar Dec 26 '22 08:12 armanm

Related thread in PRQL Discord

I do agree that lack of DML is a feature.

PRQL already has append, remove, intersect to perform set operations on resultsets. But the mutation/assignment operator (table1 := from table1 append resultset1) would bring a lot more complexity.

nileshtrivedi avatar Mar 22 '24 05:03 nileshtrivedi