druid icon indicating copy to clipboard operation
druid copied to clipboard

Druid: KILL Task doesn't cleanup the datasource from deep storage and UI shows after the KILL operation also

Open micomahesh1982 opened this issue 1 year ago • 7 comments

Affected Version: Druid Version: 28.0.0

Description

KILL Task doesn't cleanup the datasource from deep storage and UI shows after the KILL operation also.

I did perform both going to Druid UI and perform "Mark as unused all segments" and "Delete unused segments (issue kill task)" and also, performing through DELETE API call. Both of then doesn't delete datasources from deep storage meaning i can able to see in UI.

image image

I've validated both UI api's and APIs are matching however still datasources are visible in Druid console (show unused ) in the druid console. Looks like some kind of weird behavior in Druid console. I opened in a Incognito window but still i do see that.

Motivation

We have implemented to cleanup all intermediate/temp data sources through below API call.

curl -X DELETE "http://:8888/druid/coordinator/v1/datasources/<dataSourceName>" curl -X DELETE "http://:8888/druid/coordinator/v1/datasources/<dataSourceName>?kill=true&interval=1000-01-01/2999-12-31"

Is there a way to perform "delete datasource " in the database sql statement along with Druid delete api calls.

Yes that's really bad that if you show in UI, we can easily recover by clicking the "mark as used all segments" which is not our intention with this change right.

I'm not sure it's a Druid 28 UI bug or really a KILL task not evening running properly. Payload : and task is succeeded. { "type": "kill", "id": "api-issued_kill_temp1_pckekgja_1000-01-01T00:00:00.000Z_2999-12-31T00:00:00.000Z_2024-02-27T19:39:00.766Z", "dataSource": "temp1", "interval": "1000-01-01T00:00:00.000Z/2999-12-31T00:00:00.000Z", "context": { "forceTimeChunkLock": true, "useLineageBasedSegmentAllocation": true }, "batchSize": 100, "limit": null, "groupId": "api-issued_kill_temp1_pckekgja_1000-01-01T00:00:00.000Z_2999-12-31T00:00:00.000Z_2024-02-27T19:39:00.766Z", "resource": { "availabilityGroup": "api-issued_kill_temp1_pckekgja_1000-01-01T00:00:00.000Z_2999-12-31T00:00:00.000Z_2024-02-27T19:39:00.766Z", "requiredCapacity": 1 } }

micomahesh1982 avatar Mar 02 '24 16:03 micomahesh1982

@micomahesh1982 - the datasource will exist as long as there's at least one segment. Is there active ingestion that's creating data during or after issuing a kill task for the said data source? Or do you by chance have unused segments outside the kill interval 1000-01-01/2999-12-31 (you can adjust the window when issuing the kill task)? If so, yes, the datasource will still exist and be queryable. You can also verify that by querying the sys.segment table or looking at the metadata store directly. Let us know if you've any follow up questions.

abhishekrb19 avatar Mar 07 '24 12:03 abhishekrb19

@abhishekrb19 Thanks for the update however i have provided all the issues in the slack channel. (1) There is no active ingestion for this datasource. Just the custom datasource meaning once run done until we manually trigger so there won't be any ingestion during the time of KILL operation. (2) The datasource we trying to KILl is a ALL segment granularity so -146136543-09-08T08:23:32.096Z/ 146140482-04-24T15:36:27.903Z. (3) Yes i do see in Druid UI page "datasources" page, where you need to toggle on 'show unused'. when you turn on, i still see the datasource which was performed in KILL operation. (4) I did look at sys.segments and information_schema.tables, there won't be any records shows here. which i;m thinking so it's gone from druid metadata and segments. (5) Lastly, since it's showing in the Druid UI page "datasources" page, where you need to toggle on 'show unused'. here you can easily recover the datasource meaning "Mark as all used segments". which is brining back to live datasource which is not right?

Please let us know if you still have any questions. thanks

micomahesh1982 avatar Mar 07 '24 18:03 micomahesh1982

Ah, I think I see what's happening - you have an ALL granularity segment that spans -146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z. In the kill task payload noted above, you have "interval": "1000-01-01T00:00:00.000Z/2999-12-31T00:00:00.000Z". The kill interval doesn't cover the ALL granularity eternity interval, so the kill task won't delete the unused eternity segment.

In the kill task modal, when you click "Delete unused segments (issue kill task)", you can provide the interval: -146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z, and that should delete the eternity segment. Then you should see that the datasource is completely gone.

Also, as a side note, the sys.segments table only shows used segments. Currently, the only way to view unused segments is directly in the metadata store: SELECT * FROM druid_segments WHERE datasource = 'temp1' AND used = false;.

Let us know if that helps @micomahesh1982. Thanks!

abhishekrb19 avatar Mar 08 '24 02:03 abhishekrb19

Thank you for your email however I’m not sure how to dynamically delete the ALL granular segments.

Thanks,

Regards, Mahesha S Cell:8015127140

On Thu, Mar 7, 2024 at 6:59 PM Abhishek Radhakrishnan < @.***> wrote:

Ah, I think I see what's happening - you have an ALL granularity segment that spans -146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z. In the kill task payload noted above, you have "interval": "1000-01-01T00:00:00.000Z/2999-12-31T00:00:00.000Z". The kill interval doesn't cover the ALL granularity eternity interval, so the kill task won't delete the unused eternity segment.

In the kill task modal, when you click "Delete unused segments (issue kill task)", you can provide the interval: -146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z, and that should delete the eternity segment. Then you should see that the datasource is completely gone.

Also, as a side note, the sys.segments table only shows used segments. Currently, the only way to view unused segments is directly in the metadata store: SELECT * FROM druid_segments WHERE datasource = 'temp1' AND used = false;.

Let us know if that helps @micomahesh1982 https://github.com/micomahesh1982. Thanks!

— Reply to this email directly, view it on GitHub https://github.com/apache/druid/issues/16030#issuecomment-1984959260, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZEDLGCTSYE2G5U2OVNNYL3YXESQXAVCNFSM6AAAAABEDGFAK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBUHE2TSMRWGA . You are receiving this because you were mentioned.Message ID: @.***>

micomahesh1982 avatar Mar 08 '24 03:03 micomahesh1982

I see one more thing here, I see most of the time 1 segment for ALL granules then I can hardcode the interval: -146136543-09-08T08:23:32. 096Z/146140482-04-24T15:36:27.903Z however one more scenario, If multiple segments (for 10 millions then 2 intervals) ALL granular then how do the interval values?

On Thu, Mar 7, 2024 at 7:25 PM Mahesha Nayak @.***> wrote:

Thank you for your email however I’m not sure how to dynamically delete the ALL granular segments.

Thanks,

Regards, Mahesha S Cell:8015127140

On Thu, Mar 7, 2024 at 6:59 PM Abhishek Radhakrishnan < @.***> wrote:

Ah, I think I see what's happening - you have an ALL granularity segment that spans -146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z. In the kill task payload noted above, you have "interval": "1000-01-01T00:00:00.000Z/2999-12-31T00:00:00.000Z". The kill interval doesn't cover the ALL granularity eternity interval, so the kill task won't delete the unused eternity segment.

In the kill task modal, when you click "Delete unused segments (issue kill task)", you can provide the interval: -146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z, and that should delete the eternity segment. Then you should see that the datasource is completely gone.

Also, as a side note, the sys.segments table only shows used segments. Currently, the only way to view unused segments is directly in the metadata store: SELECT * FROM druid_segments WHERE datasource = 'temp1' AND used = false;.

Let us know if that helps @micomahesh1982 https://github.com/micomahesh1982. Thanks!

— Reply to this email directly, view it on GitHub https://github.com/apache/druid/issues/16030#issuecomment-1984959260, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZEDLGCTSYE2G5U2OVNNYL3YXESQXAVCNFSM6AAAAABEDGFAK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBUHE2TSMRWGA . You are receiving this because you were mentioned.Message ID: @.***>

-- Regards, Mahesha S Cell:8015127140

micomahesh1982 avatar Mar 08 '24 05:03 micomahesh1982

@micomahesh1982, you can enable auto-kill, please see these docs - https://druid.apache.org/docs/latest/operations/clean-metadata-store/#configure-automated-metadata-cleanup and https://druid.apache.org/docs/latest/configuration/#coordinator-operation.

I’m not sure how to dynamically delete the ALL granular segments.

Setting the druid.coordinator.kill.* properties in coordinator/runtime.properties should enable the coordinator duty to periodically delete all unused segments from the datasources based on your configuration values.

If I understand your second question above correctly, if you specify an interval -146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z in the API-issued kill task, the kill task will delete all unused segments whose interval overlaps with this specified kill interval. For example, if you have an unused segment with an interval 2000-01-01/2001-01-01, that will be deleted too as it overlaps with -146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z.

abhishekrb19 avatar Mar 11 '24 12:03 abhishekrb19

Thank you Abhishek.

On Mon, Mar 11, 2024 at 5:17 AM Abhishek Radhakrishnan < @.***> wrote:

@micomahesh1982 https://github.com/micomahesh1982, you can enable auto-kill, please see these docs - https://druid.apache.org/docs/latest/operations/clean-metadata-store/#configure-automated-metadata-cleanup and https://druid.apache.org/docs/latest/configuration/#coordinator-operation.

I’m not sure how to dynamically delete the ALL granular segments.

Setting the druid.coordinator.kill.* properties in coordinator/runtime.properties should enable the coordinator duty to periodically delete all unused segments from the datasources based on your configuration values https://druid.apache.org/docs/latest/configuration/#coordinator-operation .

If I understand your second question above correctly, if you specify an interval -146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z in the API-issued kill task, the kill task will delete all unused segments whose interval overlaps with this specified kill interval. For example, if you have an unused segment with an interval 2000-01-01/2001-01-01, that will be deleted too as it overlaps with -146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z.

— Reply to this email directly, view it on GitHub https://github.com/apache/druid/issues/16030#issuecomment-1988308932, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZEDLGFWQ7GNHH56PQJQQYDYXWOHBAVCNFSM6AAAAABEDGFAK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGMYDQOJTGI . You are receiving this because you were mentioned.Message ID: @.***>

-- Regards, Mahesha S Cell:8015127140

micomahesh1982 avatar Mar 11 '24 15:03 micomahesh1982

Closing this issue as the behavior was clarified.

abhishekrb19 avatar Mar 28 '24 00:03 abhishekrb19

Sure thank you so much

Regards, Mahesha S Cell:8015127140

On Wed, Mar 27, 2024 at 5:34 PM Abhishek Radhakrishnan < @.***> wrote:

Closing this issue as the behavior was clarified.

— Reply to this email directly, view it on GitHub https://github.com/apache/druid/issues/16030#issuecomment-2024200637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZEDLGHZHJN6NRQQDAWW2FLY2NQSJAVCNFSM6AAAAABEDGFAK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRUGIYDANRTG4 . You are receiving this because you were mentioned.Message ID: @.***>

micomahesh1982 avatar Mar 28 '24 00:03 micomahesh1982