elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

[test-framework] Clean up a repository only if there's no snapshots in progress

Open arteam opened this issue 2 years ago • 9 comments

Only delete the repository if there are no in progress snapshots to it. wipeSnapshots gets called in an assertBusy loop that checks that there's no in progress snapshot, so the repository will be eventually deleted.

Fixes #86890

arteam avatar Jun 28 '22 16:06 arteam

Pinging @elastic/es-distributed (Team:Distributed)

elasticmachine avatar Jun 30 '22 08:06 elasticmachine

I fail to understand how this fixes https://github.com/elastic/elasticsearch/issues/86890. The repository cannot be deleted since there's a searchable snapshot using it, and searchable snapshots should be deleted before in:

        // Clean up searchable snapshots indices before deleting snapshots and repositories
        if (hasXPack() && nodeVersions.first().onOrAfter(Version.V_7_8_0) && preserveSearchableSnapshotsIndicesUponCompletion() == false) {
            wipeSearchableSnapshotsIndices();
        }

Maybe there's some issue in wipeSearchableSnapshotsIndices()? @arteam

fcofdez avatar Jul 05 '22 10:07 fcofdez

Pinging @elastic/clients-team (Team:Clients)

elasticmachine avatar Jul 05 '22 15:07 elasticmachine

@fcofdez You're totally right! I misinterpreted the error message: I thought snapshots couldn't get deleted due snapshot indices, not searchable snapshot indices. I will take a closer look at wipeSearchableSnapshotsIndices

arteam avatar Jul 07 '22 17:07 arteam

@fcofdez

I've spent a lot of time on this task, and there are 2 separate issues. The test fails both because of searchable snapshots and regular snapshots. The fix in this PR does help, it resolves issues like this.

Jul 27 2022 at 10:47:04 PM 	FAILED	3.2 sec	elasticsearch › :x-pack:plugin:ilm:qa:multi-node:javaRestTest	elasticsearch-ci-immutable-windows-2012-r2-1658954370656474772	
WINDOWSX64elastic+elasticsearch+7.17+multijob+platform-support-windowsGRADLE_TASK=checkPart2,os=windows-2012-r2CI7.17
org.elasticsearch.client.ResponseException: method [DELETE], host [http://[::1]:59969], URI [_snapshot/fpfajosvNoBnc], status line [HTTP/1.1 500 Internal Server Error]
{"error":{"root_cause":[{"type":"illegal_state_exception","reason":"trying to modify or unregister repository [fpfajosvNoBnc] that is currently used (snapshot is in progress)"}],"type":"illegal_state_exception","reason":"trying to modify or unregister repository [fpfajosvNoBnc] that is currently used (snapshot is in progress)"},"status":500}

I can reproduce this error somewhat regularly on my machine. The error for searchable snapshots isn't reproducible for my at all. I kind of suspect that it fails only on Windows.

arteam avatar Aug 08 '22 13:08 arteam

@arteam I've been taking another look into this and I think we might be hitting a race condition here:

ESRestTestCase#wipeSearchableSnapshotsIndices deletes the mounted searchable snapshots and then we try to remove all the snapshots in order to delete the repository.

The problem seems to be that the index is rolled over after we've cleaned all the searchable snapshots. restored-.ds-logs-dhemcylega-2022.05.18-000002 is the index mounted as searchable snapshot preventing the repository removal, notice that it is the 2 generation of that index. I think that's the issue in this particular test.

fcofdez avatar Aug 08 '22 14:08 fcofdez

Thanks for the insight @fcofdez! That's a really good catch! I tried to fix the racing by adding an additional call to clear the searchable snapshot indices before removing the repository. WDYT?

arteam avatar Aug 09 '22 12:08 arteam

Thanks for looking into this @arteam, I wonder if we should look into the failing test and ensure that we don't rollover or that the rolled over index is not mounted as a searchable snapshot? I think that we should tackle the issue there instead of trying to change the cleaning logic, wdyt?

fcofdez avatar Aug 09 '22 12:08 fcofdez

@elasticmachine update branch

arteam avatar Sep 07 '22 12:09 arteam

@elasticmachine update branch

arteam avatar Sep 09 '22 13:09 arteam

The searchable snapshots bit has been addressed by https://github.com/elastic/elasticsearch/pull/90075, this PR is still relevant since we still have failures of the inability to remove repositories due to pending snapshots.

arteam avatar Sep 19 '22 13:09 arteam