mars icon indicating copy to clipboard operation
mars copied to clipboard

[WIP] use direct async call for in-process rpc

Open chaokunyang opened this issue 3 years ago • 5 comments

What do these changes do?

This PR use the direct async function call to speed up in-progress actor call, and gives a speed up about 2.6 times.

Benchmark code:

import asyncio
from contextlib import asynccontextmanager
import datetime
import os
import sys
import time

import mars.oscar as mo


class BenchmarkActor(mo.Actor):

    def __init__(self, value):
        super().__init__()
        self.value = value

    async def send(self, uid, method, iternum, *args):
        actor_ref = await mo.actor_ref(uid, address=self.address)
        value = None
        for _ in range(iternum):
            value = await getattr(actor_ref, method)(*args)
        return value

    async def inc(self, delta):
        self.value += delta
        return self.value

    def get_value(self):
        return self.value


@asynccontextmanager
async def actor_pool_context():
    start_method = (
        os.environ.get("POOL_START_METHOD", "forkserver")
        if sys.platform != "win32"
        else None
    )
    pool = await mo.create_actor_pool(
        "127.0.0.1", n_process=2, subprocess_start_method=start_method
    )
    await pool.start()
    yield pool
    await pool.stop()


async def test_dummy_call_benchmark(pool):
    ref1 = await mo.create_actor(BenchmarkActor, 1, address=pool.external_address)
    ref2 = await mo.create_actor(BenchmarkActor, 2, address=pool.external_address)
    await ref1.send(ref2, "inc", 2, 2)
    iternum = 100000
    expect = await ref2.get_value() + iternum * 2
    start = time.time()
    print(f"Start with iternum {iternum} at {datetime.datetime.now()}")
    assert await ref1.send(ref2, "inc", iternum, 2) == expect
    print(f"End with iternum {iternum} at {datetime.datetime.now()}, took {time.time() - start} seconds.")


async def main():
    async with actor_pool_context() as ctx:
        await test_dummy_call_benchmark(ctx)


if __name__ == '__main__':
    asyncio.run(main())

Benchmark Env

  • 2.2 GHz 6-Core Intel Core i7
  • 16 GB 2400 MHz DDR4
  • OS: macOS BigSur 11.5.2 (20G95)

Benchmark result

  • Without this PR: 100000 actor calls in 21.88 seconds
  • WIth this PR: 100000 actor calls in 8.19 seconds

Related issue number

Closes #2691

Check code requirements

  • [ ] tests added / passed (if needed)
  • [ ] Ensure all linting tests pass, see here for how to run them

chaokunyang avatar Feb 09 '22 05:02 chaokunyang

Please open a new issue to illustrate your changes. What's more, any benchmarks for that?

wjsi avatar Feb 09 '22 05:02 wjsi

Please open a new issue to illustrate your changes. What's more, any benchmarks for that?

@wjsi I will open an issue after I finished the benchmark.

chaokunyang avatar Feb 09 '22 07:02 chaokunyang

Remove in-process actor will get more performance boost. But, that will change lots of code.

fyrestone avatar Feb 09 '22 08:02 fyrestone

Remove in-process actor will get more performance boost. But, that will change lots of code.

Yes, removing in-process actor will get more performance boost. Actually, after using direct call for in-process call, the main cost are the oscar itself. The cost is obvious from the dumped flame graph: profile

chaokunyang avatar Feb 09 '22 08:02 chaokunyang

@chaokunyang any updates about this PR?

wjsi avatar Apr 06 '22 03:04 wjsi

chaokunyang avatar Feb 20 '23 09:02 chaokunyang