delta icon indicating copy to clipboard operation
delta copied to clipboard

Delete and Update subquery support

Open tdas opened this issue 3 years ago • 12 comments

DELETE FROM myTable1 WHERE EXISTS (Select 1 from myTable2 where myTable2.key = myTable1.key)

This query will currently throw an error, "Subqueries are not supported in the DELETE" We want to support subqueries in Delete and Update.

tdas avatar Nov 10 '21 19:11 tdas

I have the same error with something like

UPDATE Table1 a SET Column1 = 1 WHERE a.Column2 IN (SELECT b.Column2 FROM Table2 b)

FidelCastillo avatar Nov 17 '21 15:11 FidelCastillo

Yeah. Its the same issue. We did not support it earlier because some earlier version of Spark did not have the right internal support for filter with subqueries (Delete/Update implementation needs filters with subqueries). I think this limiation has been resolved in later versions of Spark, so we should try removing this line and see whether it works - https://github.com/delta-io/delta/blob/master/core/src/main/scala/org/apache/spark/sql/delta/PreprocessTableDelete.scala#L36

Anybody willing to contribute can test this out.

tdas avatar Nov 17 '21 16:11 tdas

no work with spark 3.1.2

FidelCastillo avatar Nov 17 '21 18:11 FidelCastillo

Hi,

I was attempting to perform something similar but I got the error that subqueries, as well as window and aggregate functions cannot be used in a DELETE statement.

It would be quite useful to have this kind of support, for use cases like deleting old versions of the same record:

Suppose I have an append-only delta table, it would be extremely useful to be able to do something like this in order to get rid of older record versions and keep the table tidy:

deltaTable.delete(row_number().over(Window.partitionBy(primary_key).orderBy(desc(timestamp_column))) === 1)

or a subquery:

with records_to_delete as (
SELECT key, timestamp
FROM deltatable
WHERE row_number() over (partition by key) order by timestamp desc = 1
)
DELETE FROM TABLE deltatable WHERE (key, timestamp) in (select key, timestamp from records_to_delete)

I was suggested to use the merge statement as a workaround, but I think using delete is a more straightforward approach and would be good to have these kinds of statement supported by delete commands.

mng1dev avatar Jan 25 '23 17:01 mng1dev

When this going to be fixed ?

mantovani avatar Oct 25 '23 11:10 mantovani

bump ^ - when will this be fixed?

akshaymahajans avatar Nov 20 '23 21:11 akshaymahajans

+1

wassimtrifi avatar Jan 25 '24 23:01 wassimtrifi

+1

GP202301 avatar Mar 07 '24 19:03 GP202301

Is this ever be fixed ?

mantovani avatar Apr 01 '24 15:04 mantovani

Up, we really want to have the subqueries in Delete and Update conditions.

HuyTCM avatar Apr 08 '24 07:04 HuyTCM