OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Don't force IOContext.READONCE for src file in RemoteDirectory's copyFrom

Open msfroh opened this issue 7 months ago • 9 comments

Description

The copyFrom method opens a source IndexInput, a destination IndexOutput, and calls copyBytes on the output, passing the input. For a RemoteDirectory, the output may be the remote store, which may try to parallelize the copyBytes implementation by doing a multipart upload across multiple threads.

Lucene's default copyFrom assumes the copyBytes always runs on the current thread, which means that it can use IOContext.READONCE when opening the source file. Since we can't guarantee that for RemoteDirectory, we must override the Lucene implementation to use the "true" IOContext.

It would arguably be a good idea to force the IndexInput context to be IOContext.DEFAULT, but there is an invariant that assumes the SegmentInfos file is always read with IOContext.READONCE. That should generally be fine, since the SegmentInfos file should never trigger a multipart upload (since it's tiny).

Related Issues

Resolves https://github.com/opensearch-project/OpenSearch/issues/15902

Check List

  • [ ] Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

msfroh avatar Jun 05 '25 16:06 msfroh

:x: Gradle check result for a0f175e3039a6aebef5b06dcaf5c2b81bc56c603: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 05 '25 16:06 github-actions[bot]

:x: Gradle check result for f4bc8ff647039e39d9bc3c34386fa00593fc5000: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 05 '25 17:06 github-actions[bot]

@sachinpkale -- I'd appreciate your eyes on this. I tweaked your fix from https://github.com/opensearch-project/OpenSearch/pull/17502 to go back to using READONCE for the segments file, which meant that I needed to disable async uploads for that file.

I'm not sure if that has any performance implications for the s3-repository implementation, since it will end up synchronously uploading that one file.

msfroh avatar Jun 05 '25 18:06 msfroh

:x: Gradle check result for e70cc015e1e9bb80cfa5fdc48bcc6bdca6c8a2f1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 05 '25 19:06 github-actions[bot]

:x: Gradle check result for 931edf90c3905802c1fddf8f300c1603584fe304: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 05 '25 20:06 github-actions[bot]

Thanks @vikasvb90 ! That makes a lot of sense.

It occurs to me that OCI also must not be trying to do a parallel multipart upload. As I was jumping back and forth between the OpenSearch code, the OCI repository code, and the OCI storage client code, I assumed we were going into this if branch: https://github.com/oracle/oci-java-sdk/blob/master/bmc-objectstorage/bmc-objectstorage-extensions/src/main/java/com/oracle/bmc/objectstorage/transfer/UploadManager.java#L187. But that condition must be false, because even though the OCI repository client has set uploadConfiguration.isAllowParallelUploads(), the chunkCreator.supportsParallelReads() returns false, precisely because the InputStream doesn't allow parallel reads from multiple positions.

So, the remaining cause for the WrongThreadException is that the OCI storage client still hands the single-threaded upload off to a new thread created here. If it used the current thread instead, I think we'd be fine.

msfroh avatar Jun 11 '25 19:06 msfroh

That said, @vikasvb90, @linuxpi, and @sachinpkale -- do you see any risks or significant downsides with using IOContext.DEFAULT for these copyFrom calls?

While some remote store implementations (S3, Azure, and GCS) happen to execute the PUT request on the same thread that invoked copyFrom, at least one implementation (OCI) doesn't. While IOContext.READONCE may perform a little bit better when you're sure that the remote store impl keeps things on the current thread, do we want to assume that's true for all remote store implementations?

Maybe we should add a method to BlobContainer so that it the impl can say whether or not it promises to keep IO on the current thread? For the ones that do, we can use IOContext.READONCE and for ones that don't we can use IOContext.DEFAULT.

msfroh avatar Jun 11 '25 20:06 msfroh

Maybe we should add a method to BlobContainer so that it the impl can say whether or not it promises to keep IO on the current thread? For the ones that do, we can use IOContext.READONCE and for ones that don't we can use IOContext.DEFAULT.

Yes, it looks like this is the only option to make multi-part upload work in this client with minimal set of changes.

vikasvb90 avatar Jun 12 '25 03:06 vikasvb90

But I think a better approach for sequential reads would be to use the same thread. This is also how it was done in the legacy multi-part code in repository-s3 plugin here.

vikasvb90 avatar Jun 12 '25 03:06 vikasvb90

This PR is stalled because it has been open for 30 days with no activity.

This PR is stalled because it has been open for 30 days with no activity.