Underutilized Instances Recommendation
I’ve gone through the documentation, but I’m still unclear on the exact requirements. I’m encountering the same situation as mentioned in issue #445.
Question 1: What permissions are required for these service credentials to work correctly?
Question 2: My Azure subscriptions span multiple regions. 👉 Will OptScale fetch data from all regions, or is it limited to one region only? I went through the documentation, but it does not specify which role to assign in Azure. The documentation states
-
Pay attention to the service_credentials parameter, as OptScale uses it to retrieve cloud pricing data for recommendations calculation.
-
Service credentials are required to fetch pricing information from different clouds.
-
Recommendations will not work without this configuration.
-
For recommendations to function, service_credentials must be set correctly.
Hello @nadeem-nasir
- You need to specify Reader permission.
- OptScale extracts data from all regions.
Hi @VR-Hystax
Thank you for the reply.
I followed these steps, and the overlay has been updated successfully along with the service credentials:
virtualenv -p python3 .venv
source .venv/bin/activate
source ~/.profile
nano overlay/user_template.yml # edited the YAML file
./runkube.py --no-pull -o overlay/user_template.yml -- nadeem-optscale-deployment latest
Deployment output:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 76359 0 76359 0 0 242k 0 --:--:-- --:--:-- --:--:-- 242k
09:39:20.039: Latest release tag: 2025102901-public
09:39:20.063: Connecting to ctd daemon 10.1.0.4:2376
09:39:20.063: Comparing local images for 10.1.0.4
09:39:25.911: Generating base overlay...
09:39:25.916: Connecting to ctd daemon 10.1.0.4:2376
09:39:27.905: Creating component_versions.yaml file to insert it into configmap
09:39:27.911: Deleting /configured key
09:39:28.005: Removing old job pre-configurator...
09:39:28.012: Waiting for job deletion...
09:39:28.418: Starting helm chart optscale with name nadeem-optscale-deployment on k8s cluster 10.1.0.4
Release "nadeem-optscale-deployment" has been upgraded. Happy Helming!
NAME: nadeem-optscale-deployment
LAST DEPLOYED: Mon Nov 10 09:39:28 2025
NAMESPACE: default
STATUS: deployed
REVISION: 2
TEST SUITE: None
I performed a force check and also ran kubectl rollout restart, but I still don’t see any Underutilized Instances recommendations.
Resources Details
Underutilized Instances:
Are there any additional steps required? When I first deployed the cluster, I didn’t update the service credentials. They have now been updated and are using the same configuration, with the Reader role assigned on the subscriptions. The same applies both for the data source and for the service credentials.
Instances eligible for generation upgrade, Not attached Volumes, Obsolete IPs are working
Hello @nadeem-nasir Please try adjusting the settings in this recommendation and viewing the bumiworker logs.
@VR-Hystax Thank you for the help.
I modified the Rightsizing strategy, updated the cluster, triggered a force check, removed the data sources and added them again, but the issue still persists.
The logs for bumiworker are attached. I retrieved them using:
kubectl logs -n default bumiworker-6d9b7c6679-vvxhq
Hello @nadeem-nasir We found this entry in the logs:
Rightsizing_instances statistics for 33bf7b4d-7ce0-4372-930b-db9af8a77ee6 (azure_cnr): {'no_recommended_flavor': 5, 'unable_to_get_current_flavor': 100}
This means that the insider couldn't find a price for 100 machines. Please check that service creds exist and look at the insider-worker logs. If everything is ok there, then let's use the API:
GET https://<optscale_ip>/insider/v2/flavors (it uses the cluster secret)
{
'cloud_type': 'azure_cnr',
'resource_type': 'instance',
'region':
After that, send us both the body of the request and the response from the insider.
@VR-Hystax
I currently have about 140 VMs across my subscriptions. I verified the configuration using kubectl describe configmaps optscale-etcd and confirmed that the service credentials are present.
I also tested sending requests as suggested — it works with POST but not with GET. For certain family_specs I’m receiving empty results, while for others the data is returned correctly.
- Getting results with data:
Standard_D2ads_v6
Standard_E4-2s_v5
Standard_F32als_v6
Standard_F32als_v6
Standard_D2ads_v6
Standard_D4ads_v5
Standard_E8d_v4
Standard_D4as_v4
Standard_E4s_v3
Standard_E4s_v3
Standard_E4s_v4
Standard_E16s_v3
Standard_E4s_v3
Standard_B1m
Standard_B2s
Standard_F4
Standard_F8
Standard_F8s
Standard_DS12_v2
Standard_F4s_v2
Standard_DS12_v2
Standard_B12ms
Standard_B4ms
Standard_F8s
Standard_B12ms
- insider-worker logs attached
I also found
Rightsizing_instances statistics for 33bf7b4d-7ce0-4372-930b-db9af8a77ee6 (azure_cnr): {'no_recommended_cpu': 2, 'no_recommended_flavor': 3, 'unable_to_get_current_flavor': 100}
Error is showing no_recommended_flavor': 3 and no_recommended_flavor': 5
Hi @nadeem-nasir We will investigate this issue. As soon as we have any conclusions, we will let you know immediately.
Hi @nadeem-nasir!
I’ve investigated the issue and can confirm that there are no problems on our side. In the GitHub issue, I noticed the attached traceback file, which clarifies the root cause:
_pymongo.errors.AutoReconnect: mongo-0.mongo-discovery.default.svc.cluster.local:27017: [Errno -3] Temporary failure in name resolution (configured timeouts: connectTimeoutMS: 20000.0ms)_
This indicates a deployment-level problem. The Insider worker container is unable to resolve the DNS name, which prevents it from communicating with the MongoDB service. As a result, Insider doesn’t have the necessary information to correctly display the recommendation.
Please check your cluster’s DNS resolution and service configuration to ensure the worker container can reach the MongoDB endpoint.
@ida-mn Thank you for the updates. I see it is giving recommendations for “Not attached Volumes” and “Obsolete IPs.” Are these recommendations stored in the Mango database? I usually keep the default settings and don’t change much during installation. I can either re‑deploy or update the DNS settings. Could you explain how to check the cluster’s DNS resolution and service configuration? Since I don’t have much data, I could also delete the VMs and redeploy everything. Please let me know which option would be best.
Hello @nadeem-nasir I have forwarded your response to our engineering team. I will consult with the team and get back to you with recommendations as soon as possible.
Hello @nadeem-nasir Please provide the output of these commands:
kubectl -n kube-system get pods | grep weave kubectl -n kube-system logs weave-*** | grep -i error kubectl -n kube-system get pods | grep coredns kubectl -n kube-system logs coredns-*** | grep -i error
@nadeem-nasir Thank you for the updates and here is the logs.
-
-
-
` for pod in $(kubectl -n kube-system get pods -o name | grep weave); do kubectl -n kube-system logs $pod | grep -i error; done
Defaulted container "weave" out of: weave, weave-npc, weave-init (init) INFO: 2025/11/27 11:17:51.393264 Error checking version: Get "https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=6.14.0-1014-azure&os=linux&signature=0TmZpkEyUuEMAg9YHZAmzQxzarUiW%2BnrR2PjYtwkyOI%3D&version=2.8.1": EOF INFO: 2025/11/27 17:09:36.138043 Error checking version: Get "https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=6.14.0-1014-azure&os=linux&signature=0TmZpkEyUuEMAg9YHZAmzQxzarUiW%2BnrR2PjYtwkyOI%3D&version=2.8.1": EOF INFO: 2025/11/28 00:29:26.987583 Error checking version: Get "https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=6.14.0-1014-azure&os=linux&signature=0TmZpkEyUuEMAg9YHZAmzQxzarUiW%2BnrR2PjYtwkyOI%3D&version=2.8.1": EOF INFO: 2025/11/28 06:02:27.216499 Error checking version: Get "https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=6.14.0-1014-azure&os=linux&signature=0TmZpkEyUuEMAg9YHZAmzQxzarUiW%2BnrR2PjYtwkyOI%3D&version=2.8.1": EOF`
`
- for pod in $(kubectl -n kube-system get pods -o name | grep coredns); do kubectl -n kube-system logs $pod | grep -i error; done`
for pod in $(kubectl -n kube-system get pods -o name | grep coredns); do kubectl -n kube-system logs $pod | grep -i error; done [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.Namespace ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.Namespace ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.Namespace ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: watch of *v1.Namespace ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding [ERROR] plugin/errors: 2 checkpoint-api.weave.works. A: read udp 10.254.0.33:58290->168.63.129.16:53: i/o timeout [ERROR] plugin/errors: 2 checkpoint-api.weave.works. A: read udp 10.254.0.33:40593->168.63.129.16:53: i/o timeout
Please note I change kubectl -n kube-system logs weave-*** | grep -i error and kubectl -n kube-system logs coredns-*** | grep -i error. becuase it was showing error "error: error from server (NotFound): pods "coredns-***" not found in namespace "kube-system" so i used
for pod in $(kubectl -n kube-system get pods -o name | grep weave); do kubectl -n kube-system logs $pod | grep -i error; done
@nadeem-nasir
Thank you! Looks like your weave plugin is not working properly. Try restarting it and see if there are any errors again.
kubectl -n kube-system delete pod -l name=weave-net
@sd-hystax I deleted the pods as you suggested and get the logs again. "stem logs $pod | grep -i error; done Defaulted container "weave" out of: weave, weave-npc, weave-init (init) INFO: 2025/11/30 12:48:20.782229 Error checking version: Get "https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=6.14.0-1014-azure&os=linux&signature=0TmZpkEyUuEMAg9YHZAmzQxzarUiW%2BnrR2PjYtwkyOI%3D&version=2.8.1": EOF"
weave pod is throwing exception
Hello @nadeem-nasir We will investigate your issue. I'll let you know as soon as I get any conclusions.