cloud-detect icon indicating copy to clipboard operation
cloud-detect copied to clipboard

reporting unknown cloud takes very long

Open derekjc opened this issue 1 year ago • 6 comments

Executing cloud-detect on my laptop, I get an unknown after a very long time. This also happens in my openstack environment. It appears that the metadata url used by alibaba provider is a public address and takes a while before erroring. Adding a timeout to ClientSession helps fix this.

time python3 -c 'from cloud_detect import provider; provider()'
python3 -c 'from cloud_detect import provider; provider()'  0.37s user 0.06s system 0% cpu 2:11.18 total
time curl -sq http://100.100.100.200/latest/meta-data/latest/meta-data/instance/virtualization-solution
curl -sq   0.01s user 0.02s system 0% cpu 2:10.17 total

derekjc avatar Jul 15 '23 12:07 derekjc

@derekjc You can pass timeout parameter. https://github.com/dgzlopes/cloud-detect/blob/23fa390d74d7ee435801105f29f625a2ac4907bc/cloud_detect/init.py#L71

kshivakumar avatar Jul 16 '23 06:07 kshivakumar

@kshivakumar i could pass that... but IMHO a default timeout especially considering that alibaba uses a public routable IP would be nice.

derekjc avatar Jul 16 '23 12:07 derekjc

@derekjc It's difficult to choose an optimal default value for timeout. Choose a low value and you risk not completing a request(it's rare, but possible), choose a high value and it may not be any different from not having a default value, in practical terms. That's why the decision is left to the client code. One thing that can be improved is to mention the timeout in the README so that the users are aware of the option.

kshivakumar avatar Jul 17 '23 09:07 kshivakumar

@kshivakumar From my understanding, metadata is routed via the hypervisor host and is not a remote call and hence shouldn't need much time. Without a default timeout, cloud_detect takes more than 2 minutes to return an unknown on unsupported clouds. A better value could be similar to what the battle tested cloud-init uses(I think it is 50 seconds). Additionally, even if it times out too early, won't the vendor file return the right response?

If I've not convinced you, please close this issue.

derekjc avatar Jul 17 '23 13:07 derekjc

@derekjc For all the cloud providers that support "vendor file check" the file is checked first - https://github.com/dgzlopes/cloud-detect/blob/23fa390d74d7ee435801105f29f625a2ac4907bc/cloud_detect/providers/alibaba_provider.py#L25 The metadata_url is called only when the file check fails. So, metadata_url is the last test to confirm the vendor.

Even though Alibaba's url seems public, it's no different from others n practice. Except for GCP's all other urls take around 2 mins before curl shows "Connection timed out".

When I worked on the asyncio changes I put a timeout of 5s (😄) in my first commit. After lot of contemplation I decided to remove it for two reasons(at that time):

  • There are tiny no. of users of this package. We don't have enough numbers to decide on an optimal value based on users' experience.
  • vendor identification is not something you do frequently in your application. It's usually a one-time thing(say, at the time initialization). For such an infrequent operation, accuracy is more important than speed. There's a non-zero amount of risk involved with having a timeout. Moreover, the library's purpose is to identify "a cloud vendor" not whether the host is local or some cloud.

I was not aware of "cloud-init", thanks for sharing. I skimmed through their code and found there are different timeouts for different vendors, saw 30s for one. I couldn't find the maximum. Also, while checking commit history found they updated some of the timeouts a couple times. I am sure the current timeouts are going to be updated again in the future.

Let's come back to this discussion when we have more no. of users or if more people ask for this feature.

kshivakumar avatar Jul 22 '23 07:07 kshivakumar

@kshivakumar sounds good to me!

derekjc avatar Jul 22 '23 08:07 derekjc