spin Different execution times for python spin app on local machine and Fermyon Cloud

Hello,

I am testing a Python-based spin serverless app for execution performance (https://github.com/python/pyperformance/blob/main/pyperformance/data-files/benchmarks/bm_float/run_benchmark.py). (Edited the source code to fit the spin application format). I cannot seem to understand why is there more than a 2x difference in execution time when I time the application locally, time curl http://127.0.0.1:3000/float gives around 0.84 sec compared 0.41 sec when deployed on Fermyon Cloud (including the network latency). Moreover, if I execute the http trigger using a python program to simulate a poisson workload (on avg 2req/sec ), then for many requests I get execution times of even 1.5s. I do not think that there should be a 2x execution time difference even for a local setup (64GB RAM 8cores VM) compared to the cloud platform.

The spin.toml file is the following:

`spin_manifest_version = 2

[application] authors = ["Abdul Monum [email protected]"] description = "float serverless function adapted from pyperformance" name = "float" version = "0.1.0"

[[trigger.http]] route = "/float" component = "float"

[component.float] source = "app.wasm" [component.float.build] command = "componentize-py -w spin-http componentize app -o app.wasm" watch = ["*.py", "requirements.txt"]`

Spin version (spin --version) spin 2.4.2 (340378e 2024-04-03)
Wasmtime version wasmtime-cli 20.0.0 (9e1084ffa 2024-04-22)
Installed plugin versions (cloud 0.8.0 [installed] js2wasm 0.6.1 [installed] py2wasm 0.3.2 [installed])

May 01 '24 19:05 abdulmonum

Hello, thanks for the report. I am not aware of any reason for a local execution to be that much slower than Fermyon Cloud. Would it be possible for you to publish the code required to reproduce?

May 01 '24 21:05 lann

You can find the code in this repository to reproduce. https://github.com/abdulmonum/spin-python-app.git

May 01 '24 23:05 abdulmonum

For comparison, on my Linux AMD 5900X desktop, time curl http://127.0.0.1:3000/float takes ~0.22s. Could you give more information about your local environment?

May 02 '24 13:05 lann

Hello, I changed my local environment and for simple runs time curl http://127.0.0.1:3000/float takes ~0.35s which makes sense. However, if I run a Poisson workload of ~ 2 reqs/sec, then many requests take around ~ 0.7s. If I run bombardier (https://github.com/codesenberg/bombardier) ./bombardier http://127.0.0.1:3000/float, I get the following output:

Bombarding http://127.0.0.1:3000/float for 10s using 125 connection(s)
[======================================================================================================================================================================================================] 10s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec         2.19      19.23     250.16
  Latency         8.78s      2.99s     10.01s
  HTTP codes:
    1xx - 0, 2xx - 26, 3xx - 0, 4xx - 0, 5xx - 0
    others - 120
  Errors:
       timeout - 120
  Throughput:     3.61KB/s

Why is the spin app on the local environment not able to handle many requests at the same time? If I understand correctly, for every http request sent, a new webassembly instance is spawned, serves the request, and tears down. Theoretically, that should mean we should have consistent response times (atleast when the arrival rate is as low as 2 reqs/sec). I observe that on Fermyon cloud where I get an average response time of 0.41 sec (including network latency), but shouldn't the response times be consistent in the local environment? Is there some sort of queuing of requests because this does not seem to me a Wasm issue.

My current environment: Intel E3-1230 v3 @ 3.30GHz 16GB RAM Ubuntu 22.04

May 09 '24 16:05 abdulmonum

@lann Any explanation for this?

May 20 '24 19:05 abdulmonum

Sorry, missed your previous update.

Is there some sort of queuing of requests because this does not seem to me a Wasm issue.

Yes, there is implicit queuing of async tasks in the Tokio multi thread runtime.

Any explanation for this?

I ran a few tests at different concurrency levels (bombaridier -c N ...):

-c 1: ~200ms avg, ~2ms SD
-c 10: ~260ms avg, ~38ms SD
-c 100: ~3000ms avg, ~2000ms SD

My host has 24 cores, though bombardier itself will cause some extra contention when testing entirely locally. This roughly makes sense to me for CPU-bound workloads: as request concurrency reaches multiples of the number of cores you would expect avg latency to scale similarly.

The CPU you mention appears to have 8 "cores" (threads), so at a concurrency of 125 / 8 = ~15.6 * 350ms = ~5.4s, which seems reasonably close to your 8.8s avg when accounting for various sources of overhead.

May 20 '24 19:05 lann

spin spin copied to clipboard

Different execution times for python spin app on local machine and Fermyon Cloud

spin
spin copied to clipboard