faas-netes icon indicating copy to clipboard operation
faas-netes copied to clipboard

Simplify faas-netes timeout configuration for the openfaas chart

Open welteki opened this issue 2 years ago • 1 comments

Description

Simplify faas-netes timeouts in the openfaas chart

Use the geteway readTimeout and writeTimeout values as default if faasnetes.readTimeout or faasnetes.writeTimeout is not set.

Motivation and Context

OpenFaaS has a many different timeouts to keep track of. In most cases timeouts for the gateway and faas-netes are set to the same value. To simplify configuration the values set for the gateway can be used as defaults for faas-netes.

How Has This Been Tested?

  1. Deployed OpenFaaS Pro with this updated chart and set some small timeouts for testing.
helm upgrade openfaas --install chart/openfaas \
    --namespace openfaas  \
    --set functionNamespace=openfaas-fn \
    --set generateBasicAuth=true \
    --set openfaasPro=true \
    --set autoscaler.enabled=true \
    --set gateway.readTimeout=5s \
    --set gateway.writeTimeout=5s \
    --set gateway.upstreamTimeout=4s

Verified the correct env values are set in the deployment by running

kubectl describe deploy/gateway -n openfaas
  1. Deployed the sleep function
faas-cli store deploy sleep
  1. Invoked the function using hey
hey -n 200 -c 4 -H "X-Sleep: 3900ms" http://127.0.0.1:8080/function/sleep

Summary:
  Total:        195.1964 secs
  Slowest:      3.9062 secs
  Fastest:      3.9034 secs
  Average:      3.9039 secs
  Requests/sec: 1.0246
  
  Total data:   22800 bytes
  Size/request: 114 bytes

Response time histogram:
  3.903 [1]     |
  3.904 [46]    |■■■■■■■■■■■■■■■■■■■
  3.904 [97]    |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  3.904 [42]    |■■■■■■■■■■■■■■■■■
  3.905 [10]    |■■■■
  3.905 [0]     |
  3.905 [0]     |
  3.905 [0]     |
  3.906 [0]     |
  3.906 [0]     |
  3.906 [4]     |■■


Latency distribution:
  10% in 3.9036 secs
  25% in 3.9037 secs
  50% in 3.9038 secs
  75% in 3.9040 secs
  90% in 3.9042 secs
  95% in 3.9044 secs
  99% in 3.9062 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0000 secs, 0.0002 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0001 secs
  resp wait:    3.9038 secs, 3.9033 secs, 3.9060 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0001 secs

Status code distribution:
  [200] 200 responses

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • [x] My code follows the code style of this project.
  • [x] My change requires a change to the documentation.
  • [ ] I have updated the documentation accordingly.
  • [x] I've read the CONTRIBUTION guide
  • [x] I have signed-off my commits with git commit -s
  • [ ] I have added tests to cover my changes.
  • [ ] All new and existing tests passed.

welteki avatar Aug 10 '22 09:08 welteki

@kevin-lindsay-1 WDYT?

alexellis avatar Aug 10 '22 16:08 alexellis

Sorry, been swamped.

I'm not in love with this approach, because I don't really like the gateway acting as global config in the first place. Aren't basically all timeouts "recommended" to be based off a base timeout that a user specifies? For example, most users probably leave read and write the same as their main timeout, and then simply take timeout - 5s (or a percentage) for exec_timeout.

Seems to me that you could condense all of this with 1 timeout and do a little math to figure out what to set it to under the hood, just like how the pod's graceTerminationPeriodSeconds is something like exec_timeout - 3s.

Specifically for the idea of a global config, I have almost no functions that I want a timeout inferred, so I'd use the gateway's "default" config exactly 0% of the time. I'd rather the gateway/k8s scream that I haven't specified a timeout, because I can't imagine a scenario where a default timeout assists me as the developer in knowing what's going on. I don't really like "environmental" config like this.

That said, I think it would be valuable to override these calculated settings for functions. I think if a specific timeout is not specified, it should be inferred from a "main" timeout value. Then, users can focus on setting only 1 timeout value, and if they need to make a specific timeout a particular value, they can override. I think this approach would also work fine on the gateway, as it would likely result in the same thing that already exists, but with less config/boilerplate.

If you care, I use https://github.com/creasty/defaults for this, and it works very well for my use cases, including more complex things like overrides.

kevin-lindsay-1 avatar Aug 19 '22 15:08 kevin-lindsay-1

Hi Kevin,

It doesn't sound like we've explained ourselves well here.

There is already a global timeout which is set at the gateway and has to be copied exactly to faas-netes.

They should not and can not be different and this PR simply reflects that, and removes the amount of places where it needs to be updated to work. It's purely a user-experience change, and not a functional change.

If you like to do extra typing, the change means that you can carry on copying the timeout from the gateway's values.yaml section to the faasnetes section.

Alex

alexellis avatar Aug 23 '22 09:08 alexellis