pyflyte `run`& `register` asynchronously
Tracking issue
NA
Why are the changes needed?
To make register task, workflow, and launchplan x2.5 ~ x4 faster in pyflyte run and pyflyte register by leveraging asyncio.
What changes were proposed in this pull request?
- https://github.com/flyteorg/flytekit/pull/2267: A neat workaround by @pingsutw for sync grpc clients.
- Entities serialization is not enough in async approach.
- It's necessary to separate registered entities into "task" and "non-task" categories in orders. This is because we can ensure flyteadmin returns a task registration response before registering workflows that depend on it.
How was this patch tested?
Setup process
After introducing localhost network latency to 1000ms per request, we can simulate network IO-intensive operations to cloud services (e.g., object storage, Kubernetes cluster, FlyteAdmin) on your own desktop. This way, we can finally evaluate performance improvements through asyncio without setup a real cluster outside. Otherwise, the Flyte sandbox cluster residing in macOS is just too fast to measure.
Steps to reproduce network condition:
sudo pfctl -f -: packfilter out proto tcp from any to any pipe1.sudo dnctl pipe 1 config delay 1000: Adding 1000ms latency to every dummynet requests.sudo pfctl -E: Enabling the latency setup in packfilter.
Don't forget set latency back to
0to get your original network speed.
Screenshots
- The registration elapsed time reduced from
34.1sto16.1safter introducing asynchronous registration. - Register, 9
task, 1wf, 1lp.
1. Register in the sync manner: 21.1s
python3 -m cProfile -o sync_requests_500ms.prof ./flytekit/clis/sdk_in_container/pyflyte.py register ./workflow.pysnakeviz ./sync_requests_500ms.prof
2. Register in the async manner: 6.05s
python3 -m cProfile -o async_requests_500ms.prof ./flytekit/clis/sdk_in_container/pyflyte.py register ./workflow.pysnakeviz ./async_requests_500ms.prof
3. Run in the async manner: 5.95s
python3 -m cProfile -o async_requests_500ms_run.prof ./flytekit/clis/sdk_in_container/pyflyte.py run --remote ./workflow.py wfsnakeviz ./async_requests_500ms_run.prof
Check all the applicable boxes
- [x] I updated the documentation accordingly.
- [x] All new and existing tests passed.
- [x] All commits are signed-off.
Related PRs
Docs link
Codecov Report
Attention: Patch coverage is 96.96970% with 1 lines in your changes are missing coverage. Please review.
Project coverage is 83.57%. Comparing base (
bf38b8e) to head (47185ee). Report is 5 commits behind head on master.
:exclamation: Current head 47185ee differs from pull request most recent head f0eab0a. Consider uploading reports for the commit f0eab0a to get more accurate results
| Files | Patch % | Lines |
|---|---|---|
| flytekit/remote/remote.py | 94.11% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #2276 +/- ##
==========================================
+ Coverage 83.04% 83.57% +0.52%
==========================================
Files 324 324
Lines 24861 24658 -203
Branches 3547 3510 -37
==========================================
- Hits 20645 20607 -38
+ Misses 3591 3421 -170
- Partials 625 630 +5
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.