beam icon indicating copy to clipboard operation
beam copied to clipboard

add shutdown and start mechanics to windmill streams

Open m-trieu opened this issue 1 year ago • 4 comments

Start and closing Windmill streams are currently via halfClose() and on stream creation. Implementations were previously created and returned in a "started" state usually after the stream has already sent the initial headers to open the connection to the backend servers.

Starting in the current state prevents us from being able to start the stream "lazily". And closing allows other blocking stream operations to prevent streams from being able to be closed (stalling at times up to 10-20 minutes).

  • Add start() flexibility to the WindmillStream API by allowing external callers to start the stream themselves.
  • Add shutdown() capability to allow the stream to receive a shutdown signal, that is idempotent and does not block (or is blocked by) other blocking stream operations.

This is especially important in direct path mode where the user worker manages the fan out to the backend.

in terms of implementation, similar to WindmillStream.shutdown(), WindmillStream.start()'s behavior will only execute once during the lifetime of the WindmillStream object. Subsequent calls to start() and shutdown() will do nothing.

R: @arunpandianp @scwhittle

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • [ ] Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [ ] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

m-trieu avatar Oct 14 '24 21:10 m-trieu

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

github-actions[bot] avatar Oct 14 '24 22:10 github-actions[bot]

assign set of reviewers

m-trieu avatar Oct 15 '24 00:10 m-trieu

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @damccorm added as fallback since no labels match configuration

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions[bot] avatar Oct 15 '24 00:10 github-actions[bot]

back to you @arunpandianp Thanks!

m-trieu avatar Oct 17 '24 13:10 m-trieu

back to you @arunpandianp thanks!

m-trieu avatar Oct 21 '24 03:10 m-trieu

back to you @arunpandianp @scwhittle thanks!

m-trieu avatar Oct 21 '24 23:10 m-trieu

Run Java Precommit

m-trieu avatar Oct 22 '24 20:10 m-trieu

failures seem to be unrelated

m-trieu avatar Oct 22 '24 21:10 m-trieu

Back to you! Thanks

@arunpandianp @scwhittle

m-trieu avatar Oct 23 '24 05:10 m-trieu

Like how there was a race between populating pending maps and shutdown, I think there is a race between mutating/accessing batches and shutdown in GrpcGetDataStream. Other pieces look okay to me.

arunpandianp avatar Oct 23 '24 09:10 arunpandianp

@arunpandianp @scwhittle back to you thanks!

m-trieu avatar Oct 24 '24 00:10 m-trieu

Done back to you @arunpandianp Thanks!

m-trieu avatar Oct 24 '24 02:10 m-trieu

Run Java Precommit

m-trieu avatar Oct 24 '24 22:10 m-trieu

back to you @scwhittle @arunpandianp thanks!

m-trieu avatar Oct 25 '24 05:10 m-trieu

back to you @arunpandianp @scwhittle thanks

m-trieu avatar Oct 25 '24 17:10 m-trieu

back to you thanks @scwhittle

m-trieu avatar Oct 29 '24 09:10 m-trieu

back to you @scwhittle thank you!

m-trieu avatar Nov 01 '24 01:11 m-trieu

still need to add some more tests to DirectStreamObserverTest addressed the other comments

thanks! @scwhittle

m-trieu avatar Nov 13 '24 10:11 m-trieu

back to you @scwhittle thanks!

m-trieu avatar Nov 15 '24 07:11 m-trieu

GrpcGetDataStreamTest.java fails due to a second test added to the test suite:

java.io.IOException: name already registered: Fake server for GrpcGetDataStreamTest
	at io.grpc.inprocess.InProcessServer.registerInstance(InProcessServer.java:89)
	at io.grpc.inprocess.InProcessServer.start(InProcessServer.java:80)
	at io.grpc.internal.ServerImpl.start(ServerImpl.java:185)
	at io.grpc.internal.ServerImpl.start(ServerImpl.java:94)
	at org.apache.beam.runners.dataflow.worker.windmill.client.grpc.GrpcGetDataStreamTest.setUp(GrpcGetDataStreamTest.java:80)

Abacn avatar Jan 29 '25 18:01 Abacn

even though the naming fix, it still failed in internal import

Abacn avatar Jan 29 '25 19:01 Abacn