flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

pyflyte `run`& `register` asynchronously

Open austin362667 opened this issue 1 year ago • 1 comments

Tracking issue

NA

Why are the changes needed?

To make register task, workflow, and launchplan x2.5 ~ x4 faster in pyflyte run and pyflyte register by leveraging asyncio.

What changes were proposed in this pull request?

  • https://github.com/flyteorg/flytekit/pull/2267: A neat workaround by @pingsutw for sync grpc clients.
  • Entities serialization is not enough in async approach.
    • It's necessary to separate registered entities into "task" and "non-task" categories in orders. This is because we can ensure flyteadmin returns a task registration response before registering workflows that depend on it.

How was this patch tested?

Setup process

After introducing localhost network latency to 1000ms per request, we can simulate network IO-intensive operations to cloud services (e.g., object storage, Kubernetes cluster, FlyteAdmin) on your own desktop. This way, we can finally evaluate performance improvements through asyncio without setup a real cluster outside. Otherwise, the Flyte sandbox cluster residing in macOS is just too fast to measure.

Steps to reproduce network condition:

  1. sudo pfctl -f -: packfilter out proto tcp from any to any pipe 1.
  2. sudo dnctl pipe 1 config delay 1000: Adding 1000ms latency to every dummynet requests.
  3. sudo pfctl -E: Enabling the latency setup in packfilter.

Don't forget set latency back to 0 to get your original network speed.

Screenshot 2024-03-18 at 3 15 17 PM

Screenshots

  • The registration elapsed time reduced from 34.1s to 16.1s after introducing asynchronous registration.
  • Register, 9 task, 1 wf, 1 lp.

1. Register in the sync manner: 21.1s

  1. python3 -m cProfile -o sync_requests_500ms.prof ./flytekit/clis/sdk_in_container/pyflyte.py register ./workflow.py
  2. snakeviz ./sync_requests_500ms.prof
  • Screenshot 2024-03-23 at 7 43 15 AM

2. Register in the async manner: 6.05s

  1. python3 -m cProfile -o async_requests_500ms.prof ./flytekit/clis/sdk_in_container/pyflyte.py register ./workflow.py
  2. snakeviz ./async_requests_500ms.prof
  • Screenshot 2024-03-23 at 7 44 57 AM

3. Run in the async manner: 5.95s

  1. python3 -m cProfile -o async_requests_500ms_run.prof ./flytekit/clis/sdk_in_container/pyflyte.py run --remote ./workflow.py wf
  2. snakeviz ./async_requests_500ms_run.prof
  • Screenshot 2024-03-26 at 2 19 59 PM

Check all the applicable boxes

  • [x] I updated the documentation accordingly.
  • [x] All new and existing tests passed.
  • [x] All commits are signed-off.

Related PRs

Docs link

austin362667 avatar Mar 18 '24 08:03 austin362667

Codecov Report

Attention: Patch coverage is 96.96970% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 83.57%. Comparing base (bf38b8e) to head (47185ee). Report is 5 commits behind head on master.

:exclamation: Current head 47185ee differs from pull request most recent head f0eab0a. Consider uploading reports for the commit f0eab0a to get more accurate results

Files Patch % Lines
flytekit/remote/remote.py 94.11% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2276      +/-   ##
==========================================
+ Coverage   83.04%   83.57%   +0.52%     
==========================================
  Files         324      324              
  Lines       24861    24658     -203     
  Branches     3547     3510      -37     
==========================================
- Hits        20645    20607      -38     
+ Misses       3591     3421     -170     
- Partials      625      630       +5     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 18 '24 09:03 codecov[bot]