serverless-benchmarks icon indicating copy to clipboard operation
serverless-benchmarks copied to clipboard

Error running 411.image-recognition on AWS Lambda python3.8

Open nervermore2 opened this issue 2 years ago • 10 comments

Hi, I tried the recently added python3.8's version of benchmark 411.image-recognition on AWS lambda. It gives me this error.

[ERROR] ModuleNotFoundError: No module named 'torch'
Traceback (most recent call last):
  File "/var/task/handler.py", line 20, in handler
    from function import function
  File "/var/task/function/function.py", line 14, in <module>
    import torch

Thanks!

nervermore2 avatar Jul 11 '23 07:07 nervermore2

@nervermore2 Please try to refresh all of the locally cached Docker images - I had to rebuild and push new versions to support the new versions of some the benchmarks; old Docker images are missing some of the tools:

for lang in 3.7 3.8 3.9; do for platform in aws gcp; do docker pull spcleth/serverless-benchmarks:build.${platform}.python.${lang}; done; done

mcopik avatar Jul 12 '23 15:07 mcopik

Thanks! Will do later.

nervermore2 avatar Jul 12 '23 17:07 nervermore2

It's working at least for python 3.7. I'm not planning to do testing on 3.8 until later next week. Wondering if you change the packaging behavior for 411 benchmark for python 3.7? I just want to have fair comparison among providers instead of maybe AWS and GCP zip their code in one way and Azure zip their code in another.

nervermore2 avatar Jul 15 '23 00:07 nervermore2

@nervermore2 The packaging behavior should be the same for Python 3.7 - It starts to change from 3.8 due to code size limits. I will verify that because I will be working soon on running regression on Azure. I will ping you once I can confirm that everything works on Azure.

It depends on how you define fair :) Azure might work without additional packing, but shouldn't their results be better because their platform is better suited to handle functions with large code packages? It would be unfair to punish Azure Functions for that.

mcopik avatar Jul 15 '23 15:07 mcopik

Thanks! I'm wondering what does fake resnet mean in the workload and could users know how to train the this resnet model?

nervermore2 avatar Jul 24 '23 08:07 nervermore2

@nervermore2 Resnet model should be from the MLPerf inference benchmark. Fake resnet is the generator of input test images.

mcopik avatar Jul 24 '23 19:07 mcopik

One more question regarding the curl tooling. I read about the connection_time, which is pre_transfer time reported by curl. But I'm still confused about this definition. Do we usually subtract that from the client_time so that we get a "noise-less" clientside time which exclude the network latency (from my understanding, connection_time is like a one way network latency)? In addition, It seems like increasing the concurrent invocation will increase the connection time as well. I'm not sure if that's the curl's problem or it's the apigateway/google function curl/azure function curl 's problem.

nervermore2 avatar Jul 27 '23 10:07 nervermore2

@nervermore2 We look into detailed statistics to exclude the "cost" of initializing TCP and HTTPS connection from the measurement. We use the following statistic:

time_pretransfer The time, in seconds, it took from the start until the file transfer was just about to begin. This includes all pre-transfer commands and negotiations that are specific to the particular protocol(s) involved.

The goal is always to include in the client measurement only the cost of network transport and serverless gateway processing.

There is no way to get "noise-less" measurements, as your data will always be impacted by the current network situation in the cloud. Latency can be minimized by placing your benchmarking VM in the same region as Lambda functions, but removing it is impossible. Increasing the concurrency will increase the latency, as a single benchmarking client needs now to initialize more requests and send more payload to the service.

mcopik avatar Aug 03 '23 20:08 mcopik

Thanks.

Increasing the concurrency will increase the latency, as a single benchmarking client needs now to initialize more requests and send more payload to the service.

I'm wondering how the concurrency worked in SeBS? I thought we have one client, which makes variable invocations number of "parallel" invocations. So what's the most time consuming part in the parallel process? If cloud providers could handle concurrent requests (do TCP handshake) nicely, this means we shouldn't see pre_transfer time increase drastically. For example, if we do concurrent (let's say 10) curl command on a single URL, we should see a little bit higher pre_trasnfer time than if we do concurrent = 1. The pre_transfer time shouldn't increase linearly if we increase concurrent count from 1 to 10 let's say. So I think I might miss some part on how SeBS use pycurl and have some misunderstanding on SeBS's parallel programming model.

nervermore2 avatar Aug 03 '23 20:08 nervermore2

@nervermore2 Yes, there shouldn't be a linear increase unless there is a bug or some kind of bottleneck. Is this the behavior you're observing?

mcopik avatar Aug 13 '23 12:08 mcopik

@nervermore2 Closing the issue due to inactivity. If the issue persists, then please feel free to reopen the issue and share more details :)

mcopik avatar May 06 '24 12:05 mcopik