[Feature Request] Paginate snapshot indices status fetching
Is your feature request related to a problem? Please describe
Our customers depend on the snapshot status API to access information about snapshot indices, like store size, number of docs, etc. The TransportSnapshotStatusAction utilizes a single Generic thread to retrieve repository data, snapshot information, snapshot index metadata, and shard snapshot status if the specified snapshot(s) is not currently running. However, when the specified snapshot contains a large number of indices, the execution time for this action becomes significantly prolonged.
In one of the snapshot which has 15000+ shards, snapshot status fetching was taking 8min.
Describe the solution you'd like
Provide a new API (_snapshot/{repository}/{snapshot}/_list/indices) to paginate snapshot indices status like we did in #14258. The new API works only for indexes belonging to a specific snapshot. Since the order of indices in SnapshotInfo is settled, we can simply use from + size to paginate. If the specified snapshot is running, then the paginating parameters will have no effect.
Related component
Storage:Snapshots
Describe alternatives you've considered
Using the snapshot thread pool to parallelize indices snapshot status fetching. But the snapshot thread pool might be blocked on long running tasks. Moreover, the maximum number of threads in the snapshot thread pool is only 5, so the speedup effect may be limited
Additional context
No response
Provide a new API (
_snapshot/{repository}/{snapshot}/_list/indices)
I believe #14258 introduced a new top-level _list API concept, like _list/indices and _list/shards/{index}. We'd probably want to follow the same pattern here with something like _list/snapshots/{repository}/{snapshot}/.
@andrross I was thinking of paging the indices section returned by the status API, and still returning the response in JSON format. The list API needs to return the response in CAT or JSON format, which doesn't seem like a good fit. I'm wondering if it's possible to have two APIs for paging, one for snapshot indices and one for snapshot shards.
The API for paging indices is _list/snapshot/{repository}/{snapshot}/indices, the response includes shard stats and snapshot file stats, and the default fields of response are as follows:
index: index nameshards.total: total number of shards included in the snapshot.shards.done: number of shards that initialized, started, and finalized successfullyshards.failed: number of shards that failed to be included in the snapshotfile_count: total number of files that are referenced by the snapshotsize_in_bytes: total size of files that are referenced by the snapshotstart_time_in_millis: time (in milliseconds) when snapshot creation begantime_in_millis: total time (in milliseconds) that the snapshot took to complete
The API for paging shards is _list/snapshot/{repository}/{snapshot}/shards, the response includes shards part of index objects, and the default fields of response are as follows:
index: index nameshard: the number of shardstage: the current state of shards in the snapshotfile_count: total number of files that are referenced by the snapshotsize_in_bytes: total size of files that are referenced by the snapshotstart_time_in_millis: time (in milliseconds) when snapshot creation begantime_in_millis: total time (in milliseconds) that the snapshot took to complete
What do you think?
@bugmakerrrrrr Adding two new _list APIs make sense to me. Seems like it would be better to create new APIs designed for pagination versus trying to shim it into existing APIs. I think this was the basic reason the top-level _list construct was created.
@andrross I am totally new regarding this and would love to contribute. I was thinking of doing the same way as specified by @bugmakerrrrrr. Any insights regarding this would be grateful
@bugmakerrrrrr, @andrross, @ankitbk07 Is this issue being worked on? If not I will pick it up - could you mark me as assignee.
@NikolaiLong I'll mark you as the assignee.
@ankitbk07 I don't see any linked activity in the past two weeks, but please speak up if you're actively working on this issue.
Thanks @andrross, I am finishing up project setup today and will begin development tomorrow, as this is my first story for OpenSearch.
@andrross I did try to set up but my laptop specs (sadly it was under maintaining) could not start the server despite doing exactly as the developer guide and felt kind a lost while going through the code base as well. Does the documentation cover about it or are there any tips from your side regarding would be really helpful for future contributions.
@ankitbk07 Here is a youtube playlist of helpful OS videos. These should help you get started on working: https://youtube.com/playlist?list=PLzgr9zSpws17sLsI378WwKxO1vdA_FmKK&si=WBNYJ9g82fE4cUeD
I'm working on this task now, but there are many others ready to pick.
After many attempts at running this OpenSearch locally, it is clear that my machine does not have adequate processing power to run this application and complete tasks for this project. Please remove me as assignee.
Hey @NikolaiLong! I've removed you as an assignee. Sorry to hear that you were having trouble. Feel free to ping me on slack to discuss the specific issues you were running into to see if we can improve some things here and potentially unblock you from being able to contribute.
@andrross @NikolaiLong Hi I would love to contribute, since this issue is marked as a good first issue, can I be assigned this issue if this is not already being worked on?
hi @andrross, is someone actively working on this issue? if not, I would like to work on it
Hey @priorigratia, please chime in if you're still working on this. I don't see any activity since February so I'm going to re-assign this to @omricohenn
Hi @andrross I noticed that there’s been no recent activity on this issue. I’d be happy to pick this up and start contributing. Could you please assign this issue to me if it’s available?
Thank you!
HI @BeomSeogKim , I am on it.
I am interested to work on this @andrross