OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[Feature Request] Paginate snapshot indices status fetching

Open bugmakerrrrrr opened this issue 11 months ago • 17 comments

Is your feature request related to a problem? Please describe

Our customers depend on the snapshot status API to access information about snapshot indices, like store size, number of docs, etc. The TransportSnapshotStatusAction utilizes a single Generic thread to retrieve repository data, snapshot information, snapshot index metadata, and shard snapshot status if the specified snapshot(s) is not currently running. However, when the specified snapshot contains a large number of indices, the execution time for this action becomes significantly prolonged.

In one of the snapshot which has 15000+ shards, snapshot status fetching was taking 8min.

Describe the solution you'd like

Provide a new API (_snapshot/{repository}/{snapshot}/_list/indices) to paginate snapshot indices status like we did in #14258. The new API works only for indexes belonging to a specific snapshot. Since the order of indices in SnapshotInfo is settled, we can simply use from + size to paginate. If the specified snapshot is running, then the paginating parameters will have no effect.

Related component

Storage:Snapshots

Describe alternatives you've considered

Using the snapshot thread pool to parallelize indices snapshot status fetching. But the snapshot thread pool might be blocked on long running tasks. Moreover, the maximum number of threads in the snapshot thread pool is only 5, so the speedup effect may be limited

Additional context

No response

bugmakerrrrrr avatar Jan 09 '25 13:01 bugmakerrrrrr

Attendees - 1 2 3 4

Thanks for filing this issue, please feel free to submit a pull request.

ashking94 avatar Jan 09 '25 15:01 ashking94

Provide a new API (_snapshot/{repository}/{snapshot}/_list/indices)

I believe #14258 introduced a new top-level _list API concept, like _list/indices and _list/shards/{index}. We'd probably want to follow the same pattern here with something like _list/snapshots/{repository}/{snapshot}/.

andrross avatar Jan 11 '25 00:01 andrross

@andrross I was thinking of paging the indices section returned by the status API, and still returning the response in JSON format. The list API needs to return the response in CAT or JSON format, which doesn't seem like a good fit. I'm wondering if it's possible to have two APIs for paging, one for snapshot indices and one for snapshot shards.

The API for paging indices is _list/snapshot/{repository}/{snapshot}/indices, the response includes shard stats and snapshot file stats, and the default fields of response are as follows:

  • index: index name
  • shards.total: total number of shards included in the snapshot.
  • shards.done: number of shards that initialized, started, and finalized successfully
  • shards.failed: number of shards that failed to be included in the snapshot
  • file_count: total number of files that are referenced by the snapshot
  • size_in_bytes: total size of files that are referenced by the snapshot
  • start_time_in_millis: time (in milliseconds) when snapshot creation began
  • time_in_millis: total time (in milliseconds) that the snapshot took to complete

The API for paging shards is _list/snapshot/{repository}/{snapshot}/shards, the response includes shards part of index objects, and the default fields of response are as follows:

  • index: index name
  • shard: the number of shard
  • stage: the current state of shards in the snapshot
  • file_count: total number of files that are referenced by the snapshot
  • size_in_bytes: total size of files that are referenced by the snapshot
  • start_time_in_millis: time (in milliseconds) when snapshot creation began
  • time_in_millis: total time (in milliseconds) that the snapshot took to complete

What do you think?

bugmakerrrrrr avatar Jan 13 '25 08:01 bugmakerrrrrr

@bugmakerrrrrr Adding two new _list APIs make sense to me. Seems like it would be better to create new APIs designed for pagination versus trying to shim it into existing APIs. I think this was the basic reason the top-level _list construct was created.

andrross avatar Jan 13 '25 17:01 andrross

@andrross I am totally new regarding this and would love to contribute. I was thinking of doing the same way as specified by @bugmakerrrrrr. Any insights regarding this would be grateful

ankitbk07 avatar Jan 20 '25 10:01 ankitbk07

@bugmakerrrrrr, @andrross, @ankitbk07 Is this issue being worked on? If not I will pick it up - could you mark me as assignee.

NikolaiLong avatar Feb 04 '25 18:02 NikolaiLong

@NikolaiLong I'll mark you as the assignee.

@ankitbk07 I don't see any linked activity in the past two weeks, but please speak up if you're actively working on this issue.

andrross avatar Feb 04 '25 18:02 andrross

Thanks @andrross, I am finishing up project setup today and will begin development tomorrow, as this is my first story for OpenSearch.

NikolaiLong avatar Feb 05 '25 21:02 NikolaiLong

@andrross I did try to set up but my laptop specs (sadly it was under maintaining) could not start the server despite doing exactly as the developer guide and felt kind a lost while going through the code base as well. Does the documentation cover about it or are there any tips from your side regarding would be really helpful for future contributions.

ankitbk07 avatar Feb 10 '25 11:02 ankitbk07

@ankitbk07 Here is a youtube playlist of helpful OS videos. These should help you get started on working: https://youtube.com/playlist?list=PLzgr9zSpws17sLsI378WwKxO1vdA_FmKK&si=WBNYJ9g82fE4cUeD

I'm working on this task now, but there are many others ready to pick.

NikolaiLong avatar Feb 13 '25 00:02 NikolaiLong

After many attempts at running this OpenSearch locally, it is clear that my machine does not have adequate processing power to run this application and complete tasks for this project. Please remove me as assignee.

NikolaiLong avatar Feb 20 '25 02:02 NikolaiLong

Hey @NikolaiLong! I've removed you as an assignee. Sorry to hear that you were having trouble. Feel free to ping me on slack to discuss the specific issues you were running into to see if we can improve some things here and potentially unblock you from being able to contribute.

andrross avatar Feb 20 '25 18:02 andrross

@andrross @NikolaiLong Hi I would love to contribute, since this issue is marked as a good first issue, can I be assigned this issue if this is not already being worked on?

priorigratia avatar Feb 23 '25 04:02 priorigratia

hi @andrross, is someone actively working on this issue? if not, I would like to work on it

omricohenn avatar Apr 19 '25 10:04 omricohenn

Hey @priorigratia, please chime in if you're still working on this. I don't see any activity since February so I'm going to re-assign this to @omricohenn

andrross avatar May 02 '25 14:05 andrross

Hi @andrross I noticed that there’s been no recent activity on this issue. I’d be happy to pick this up and start contributing. Could you please assign this issue to me if it’s available?

Thank you!

BeomSeogKim avatar Jun 12 '25 13:06 BeomSeogKim

HI @BeomSeogKim , I am on it.

omricohenn avatar Jun 16 '25 07:06 omricohenn

I am interested to work on this @andrross

AlyHKafoury avatar Aug 26 '25 21:08 AlyHKafoury