OpenSearch
                                
                                 OpenSearch copied to clipboard
                                
                                    OpenSearch copied to clipboard
                            
                            
                            
                        Add capability to disable source recovery_source for an index
Description
This change will add the capability where a user can disable the recovery_source, just like user can disable source. Disabling of _recovery_source can be done per field by mentioning the field names too in includes and excludes.
Example of an Index Mappings where both source and recovery source is disabled:
PUT my-index
{
	"settings": {
		"index": {
			"number_of_shards": 3,
			"refresh_interval": "1s"
		}
	},
	"mappings": {
		"_source": {
			"enabled": false,
			"recovery_source_enabled": false
		},
			"properties": {
				"title": {
					"type": "text"
				}
			}
		}
}
Related Issues
Resolves #13490
Related Issue: https://github.com/opensearch-project/k-NN/issues/1599
Check List
- [X] New functionality includes testing.
- [X] All tests pass
 
- [X] New functionality has been documented.
- [X] New functionality has javadoc added
 
- [X] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
- [X] Commits are signed per the DCO using --signoff
- [X] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
- [X] Public documentation issue/PR created
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
:x: Gradle check result for beb57cfe9bbe183126e0fa0dda6fd4780476d32a: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for a6dcdfbfe55ae58cfa8071cd1f9c66362ec5fb8b: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for fe198109739ba5a32ad86a4f9480e305652aa8e8: null
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for dd51e818ab90e449de8874a74e3f1f89c5ca642e: null
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
@msfroh , @reta can I get a review on this PR?
@msfroh , @reta can I get a review on this PR?
Sorry about that @navneet1v, referring to the modification (in the issue)
Plus given that _recovery_source gets deleted after a certain point of time recovery source was never a full proof way to recover from crashes.
I don't really follow this statement: the ops recovery is used to recover the index when translog is written but node crashes before updating the index (please correct me if I am wrong here). I was also looking into original issue here [1] and the motivation behind introducing _recovery_source is pretty compelling (it has the purpose). Do we really understand the consequences of disabling it? (it looks to me not fully, it does look like not safe thing to do nonetheless is problematic for dense vectors as ES forums / issues show)
[1] https://github.com/elastic/elasticsearch/pull/85649
@msfroh , @reta can I get a review on this PR?
Sorry about that @navneet1v, referring to the modification (in the issue)
Plus given that _recovery_source gets deleted after a certain point of time recovery source was never a full proof way to recover from crashes.
I don't really follow this statement: the ops recovery is used to recover the index when translog is written but node crashes before updating the index (please correct me if I am wrong here). I was also looking into original issue here [1] and the motivation behind introducing
_recovery_sourceis pretty compelling (it has the purpose). Do we really understand the consequences of disabling it? (it looks to me not fully, it does look like not safe thing to do nonetheless is problematic for dense vectors as ES forums / issues show)
@reta what I was trying to say there was recovery source gets removed after a certain point in time if merges are happening. So, its not a guaranteed thing. I can look more closely on how this recovery_source is getting used.
Also with this PR, I am not removing the recovery_source I am adding the capability to remove it if user wants. Just like they can do for _source and has to live with disadvantages like update cannot happen.
@msfroh , @reta can I get a review on this PR?
Sorry about that @navneet1v, referring to the modification (in the issue)
Plus given that _recovery_source gets deleted after a certain point of time recovery source was never a full proof way to recover from crashes.
I don't really follow this statement: the ops recovery is used to recover the index when translog is written but node crashes before updating the index (please correct me if I am wrong here). I was also looking into original issue here [1] and the motivation behind introducing
_recovery_sourceis pretty compelling (it has the purpose). Do we really understand the consequences of disabling it? (it looks to me not fully, it does look like not safe thing to do nonetheless is problematic for dense vectors as ES forums / issues show) [1] elastic/elasticsearch#85649@reta what I was trying to say there was recovery source gets removed after a certain point in time if merges are happening. So, its not a guaranteed thing. I can look more closely on how this recovery_source is getting used.
Also with this PR, I am not removing the recovery_source I am adding the capability to remove it if user wants. Just like they can do for _source and has to live with disadvantages like update cannot happen.
This is the original PR that added the recovery_source: https://github.com/elastic/elasticsearch/pull/31106
As per this PR recovery_source was useful for CCR and soft deletes where we need to re-create the whole document back again. I can do some more testing where I can enable CCR and see what happens when we remove the _recovery_source
I can do some more testing where I can enable CCR and see what happens when we remove the _recovery_source
I think we should definitely understand more what recovery flow needs its. To me, I think it is definitely needed before the segments are flushed (and this is why it is removed later on, as you mentioned). It may be needed during replication when the ops are replicated (not docs).
@reta I did some deep-dive and checked with CCR team(@ankitkala) and got to know recovery_source is not getting used in CCR flow.
One flow which I am still confirming where recovery_source is getting used is Peer to Peer recovery using LuceneChangeSnapshot. What I have found is to do Peer to Peer recovery sometime opensearch queries a lucene shard fetches the _recovery_source or _source from the index and use this to do the recovery.
Below are the things that I am trying to confirm:
- If no _recovery_source is found will the recovery fail or recovery will use copying of segments features from 1 node to another node.
- Will segment replication with remote store uses this recovery mechanism.
@mgodwan do you have some thoughts on this?
elastic/elasticsearch#31106
CCR by default uses translog for fetching the operations and lucene is used only as a fallback mechanism. One exception to this is remote backed indices where the leader shard persists translogs on remote. For this case, we don't want leader to download the translogs and thus CCR relies on lucene for the operations.
:x: Gradle check result for 5b8dacdecbd47703877d7e90e5d2bdc61faf20d1: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
@reta So I did some more deep-dive and validated few scenarios where whole source and recovery source is removed(with doc replication and seg replication).
- The recovery source/source is only used to replay the docs for un-commited segments. Once the segment is committed, the file based recovery is used. I added an Integration test to replicate this behavior.
In my opinion removing recovery source helps and there are implementation of Opensearch where segments are stored remotely and don't rely on Lucene docs for peer recovery. This can be very useful feature.
Please let me know if there are any further questions.
In my opinion removing recovery source helps and there are implementation of Opensearch where segments are stored remotely and don't rely on Lucene docs for peer recovery. This can be very useful there.
Thanks @navneet1v , in this case it should only be done for remote indices, correct?
In my opinion removing recovery source helps and there are implementation of Opensearch where segments are stored remotely and don't rely on Lucene docs for peer recovery. This can be very useful there.
Thanks @navneet1v , in this case we it should only be done for remote indices, correct?
Its upto the service and specific implementation to deicide. If as a user after every bulk request call I am calling flush then there will be no issues as recoveries will happen.
Its upto the service and specific implementation to deicide. If as a user after every bulk request call I am calling flush then there will be no issues as recoveries will happen.
This is too much to ask from our users I believe. We do have expert level settings though and unless we provide comprehensive and clear documentation explaining the impact of this setting (which to me seems difficult but I certainly could be mistaken) - we should not expose it.
Its upto the service and specific implementation to deicide. If as a user after every bulk request call I am calling flush then there will be no issues as recoveries will happen.
This is too much to ask from our users I believe. We do have expert level settings though and unless we provide comprehensive and clear documentation explaining the impact of this setting (which to me seems difficult but I certainly could be mistaken) - we should not expose it.
@reta something similar we already expose where if _source is disabled or fields are excluded in the _source then updates doesn't work. That is also an expert setting where user should store the whole document on their side. I don't see how this is different.
Plus if we see, the integration test that I have added this peer to peer recovery is happening when we add new nodes and primaries are moving to new node. That is where if flush is not done the recovery doesn't happen.
I don't see how this is different.
@navneet1v _source was and is user visible (mapping, search, etc), _recovery_source was never. Plus, I am only concerned in documenting it in a way users could understand clearly the consequences .
:x: Gradle check result for f8da58a757ae7a05d3e2ad69764e093b39521592: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
I don't see how this is different.
@navneet1v
_sourcewas and is user visible (mapping, search, etc),_recovery_sourcewas never.
It is not directly visible but in the docs it is mentioned. @reta this recovery is only for Peer to Peer with primary if nodes are getting adding before flush. A simple flow for users here would be if they are trying to add new nodes would be this:
- Index the documents.
- Do a flush.
- Add new nodes.
instead of:
- Index the documents.
- Add new nodes.
And flush generally happens in some time. So, I am not able to understand how this is a problem.
It is not directly visible but in the docs it is mentioned.
@navneet1v you asked the difference, I explained (I didn't know about _recovery_source fe), could you share the publicly available docs you are mentioning?
And flush generally happens in some time. So, I am not able to understand how this is a problem.
I am not claiming the problem, I am asking you to document these settings - the code at the moment has zero documentation, check fe IndexModule on how we document such expert settings.
It is not directly visible but in the docs it is mentioned.
@navneet1v you asked the difference, I explained (I didn't know about
_recovery_sourcefe), could you share the publicly available docs you are mentioning?
Sure her it is ref.
And flush generally happens in some time. So, I am not able to understand how this is a problem.
I am not claiming the problem, I am asking you to document these settings - the code at the moment has zero documentation, check fe
IndexModuleon how we document such expert settings.
Make sense. I will add the documentation in proper details on this. Thanks for clarification. What I was understanding is that we should not do this as this is very expert setting. I got your point and I will add docs for this in code and also in opensource documentation.
Sure her it is ref.
Thanks @navneet1v but I don't see any mentions of _recovery_source there [1], am I missing something? (Citing that you commented It is not directly visible but in the docs it is mentioned.)
[1] https://www.elastic.co/guide/en/elasticsearch/reference/7.10/mapping-source-field.html#:~:text=Think%20before%20disabling%20the%20_source%20field
I tried doing some Googling, but couldn't find anything that documents the existence of _recovery_source or explains what it does, other than the source code.
I think we would need to document it ourselves. (Which, as @reta called out, could be confusing.)
If we want to add this as an expert flag, we should make the risks extremely clear. Essentially a "don't flip this switch unless you understand and accept the risks" warning.
I previously worked on a system that used OS/ES for search and (because it's not a database) I treated any unflushed updates as non-durable.
Since I need to keep complaining about it, if updates are stored durably in an event stream and shards pull updates, this whole problem goes away.
@msfroh, @reta I have done some checking. Addition is recovery source is mentioned at few places and issues. Let me try to post them.
- Main PR which added recovery source: https://github.com/elastic/elasticsearch/pull/31106
- A github issue on ES saying a user removed _source but still disk size didn't reduce as _recovery_source got added: https://github.com/elastic/elasticsearch/issues/41628, exact comment: https://github.com/elastic/elasticsearch/issues/41628#issuecomment-488155381
As part of this capability getting added I am aligned to add specific documentation around when to use this feature and how to use this feature, in Opensearch docs and also in the code. GH issue: https://github.com/opensearch-project/documentation-website/issues/7282
But one thing I want to confirm here is, is there anything else required apart from docs before I can get a review on the PR.
@reta and @msfroh I have updated the code with comments and expert flag. Will do the same for Opensearch documentation.
:x: Gradle check result for 09ee2ebb184fe5f4b8c367887e2b4d04d5f22ffa: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 3cdba4593eb8edb6bddb09d352a5bfa243407b0e: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
❌ Gradle check result for 3cdba45: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Tests failed.
org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction org.opensearch.gateway.remote.RemoteClusterStateCleanupManagerIT.testRemoteCleanupDeleteStale org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"true"}} org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"false"}} org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"search.concurrent_segment_search.enabled":"false"}} org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"search.concurrent_segment_search.enabled":"true"}}
Flaky test:
- org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction: https://github.com/opensearch-project/OpenSearch/issues/12197
- org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode: https://github.com/opensearch-project/OpenSearch/issues/13600
- For this test: org.opensearch.gateway.remote.RemoteClusterStateCleanupManagerIT.testRemoteCleanupDeleteStale not able to find any flaky test failure gh issue. any thoughts if this is a flaky test or not. On my local this test is successful.
But one thing I want to confirm here is, is there anything else required apart from docs before I can get a review on the PR.
@navneet1v I think there is nothing else:
- we (will) document the presence of _recovery_sourceso users (and most of us as well) will be aware of it
- we (will) document where and why _recovery_sourceis used so users (and most of us as well) will be aware of reasons
- we (will) provide the expert setting to alter _recovery_sourcebehaviour, explaining the risks