determined icon indicating copy to clipboard operation
determined copied to clipboard

feat: reuse HTTP sessions

Open azhou-determined opened this issue 10 months ago • 2 comments

Description

Introduce the option of using persistent HTTP sessions in api.Session.

Previously, each API call was wrapped in its own requests.Session, which meant making a new underlying TCP connection/TLS handshake every time. Sometimes this is necessary because we can't know when to close the session, and we don't want to leave too many dangling HTTP connections open. But in many cases, we make lots of API calls in quick succession within known bounds, and in these cases using a persistent HTTP session significantly improves performance:

Reporting metrics in Core API/Detached mode:

def main():
    core_v2.init(
        defaults=core_v2.DefaultConfig(
            name="unmanaged-1-singleton",
        ),
    )

    for i in range(100):
        core_v2.train.report_training_metrics(
            steps_completed=i, metrics={"loss": random.random()}
        )

        if (i + 1) % 10 == 0:
            core_v2.train.report_validation_metrics(
                steps_completed=i, metrics={"loss": random.random()}
            )

    core_v2.close()

Not using persistent HTTP sessions: 13-14s With a persistent HTTP session: 6-7s

CLI calls that fetch multiple pages: i.e. list-experiments, list-checkpoints Before: 0.75s After: 0.3s

api.Session as a context manager

This PR exposes the option of re-using the same underlying HTTP session through api.Session as a context manager:

with api.Session(...) as sess:
  # All requests within the context share the same persistent HTTP connection.
  # Once the context exits, the session is closed.
  sess.post(...)
  sess.get(...)

When used directly, each HTTP call will create a new session every time. This is the current behavior, so existing code will continue to function the same way:

# When called directly outside of a context, each request uses a new HTTP session.
sess = api.Session(...)
sess.post(...)
sess.get(...)

This functionality is implemented in Core API, where all internal API calls within a core.Context (with core.init(...) as core_context:) will now share the same persistent HTTP connection which is closed when the core_context context exits.

CLI

In the CLI, api.Session as a context manager is implemented as a decorator, and when used, exposes a session that is persistent within the decorated function:

- def list_experiments(args: Namespace) -> None:
-     sess = cli.setup_session(args)

+ @cli.session
+ def list_experiments(args: Namespace, sess: api.Session) -> None:

I only changed a few select APIs/CLIs to use the new functionality, mostly as examples of how it would be used. There are many places that would see performance benefits from persistent sessions, but I imagine that will probably be a separate effort/PR.

Other api.Sessionrefactors

Some other small refactors made to api.Session:

  • Set auth/cert-related parameters on the session instead of passing them into each request
  • harness/determined/common/requests.py was deleted, its logic consolidated into harness/determined/common/api/_session.py

Test Plan

  • Run any experiment that reports a bunch of metrics (ideally Core API / detached mode) and enable debug logs.
  • Submit the experiment and the logs should show "urllib3.connectionpool: Starting new HTTP connection" only once at the start of training
  • Experiment should be noticeably faster

Commentary (optional)

Checklist

  • [ ] Changes have been manually QA'd
  • [ ] User-facing API changes need the "User-facing API Change" label.
  • [ ] Release notes should be added as a separate file under docs/release-notes/. See Release Note for details.
  • [ ] Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

azhou-determined avatar Apr 05 '24 16:04 azhou-determined

Deploy Preview for determined-ui canceled.

Name Link
Latest commit 3bda736f23de809ae5ea32e689dde50970eb86f0
Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/663124c264e9aa000803fea5

netlify[bot] avatar Apr 05 '24 16:04 netlify[bot]

Codecov Report

Attention: Patch coverage is 89.70588% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 44.61%. Comparing base (55b7fd9) to head (3bda736). Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9116      +/-   ##
==========================================
+ Coverage   44.59%   44.61%   +0.01%     
==========================================
  Files        1273     1273              
  Lines      155832   155888      +56     
  Branches     2439     2439              
==========================================
+ Hits        69501    69555      +54     
- Misses      86092    86094       +2     
  Partials      239      239              
Flag Coverage Δ
backend 41.74% <ø> (-0.01%) :arrow_down:
harness 64.09% <89.70%> (+0.04%) :arrow_up:
web 35.04% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
harness/determined/cli/__init__.py 100.00% <ø> (ø)
harness/determined/cli/experiment.py 18.10% <100.00%> (+0.21%) :arrow_up:
harness/determined/common/__init__.py 71.42% <100.00%> (ø)
harness/tests/common/api/test_session.py 100.00% <100.00%> (ø)
harness/tests/common/test_tls.py 100.00% <100.00%> (ø)
...rness/determined/common/experimental/experiment.py 71.95% <0.00%> (ø)
harness/determined/cli/_util.py 69.09% <71.42%> (+0.34%) :arrow_up:
harness/determined/core/_context.py 57.71% <60.00%> (+0.07%) :arrow_up:
harness/determined/common/api/_session.py 85.48% <88.15%> (+3.66%) :arrow_up:

... and 4 files with indirect coverage changes

codecov[bot] avatar Apr 19 '24 17:04 codecov[bot]