optscale icon indicating copy to clipboard operation
optscale copied to clipboard

billing import failed for kubernetes datasource

Open dmalagoli-dylog opened this issue 9 months ago • 5 comments

Describe the bug When showing Kubernetes Datasource, cost are 0 and shows error abount "Billing import".

The details of the error are "Failed to find node Node aks-default-40452920-vmss00000m (provider azure, flavor Standard_D2s_v3, os type linux, region italynorth) price".

My Optscale version is 2025032501-public. I searched for logs in diworker, insider-api, and insider-worker, but I can't find any relevant error.

To Reproduce I added an Azure Hosted cluster as Kubernetes Datasource following the instruction provided

Expected behavior I tough that cost about the cluster will be shown

Screenshots

Image

Image

Desktop

  • OS: Windows 11
  • Browser: Chrome
  • Version: 134.0.6998.179

dmalagoli-dylog avatar Apr 07 '25 12:04 dmalagoli-dylog

Hi @dmalagoli-dylog , our team investigates your problem, I can advise you to take the following steps:

  1. Set service credentials for Azure cloud in your user_template.yml: https://github.com/hystax/optscale/blob/integration/optscale-deploy/overlay/user_template.yml#L12 We suppose that Optscale do not get pricing from cloud due to insufficient permissions
  2. Update your cluster using updated template using command: ./runkube.py --with-elk -o overlay/user_template.yml -- <deployment name> <version>
  3. After cluster will start trigger job for insider service to get pricings from cloud using command: kubectl create job --from=cronjobs/insider-scheduler <your_job_name>
  4. After getting pricings completed(you can find logs about it in kibana: https://github.com/hystax/optscale/blob/integration/documentation/kibana_logs.md using filter name: *insider*) wait until next report import will be started

stanfra avatar Apr 08 '25 10:04 stanfra

I did what you said, and the insider worker seems to be working fine

Image ...

Image

Insider Api seems also to be working fine.

Image

Maybe is a permission problem. I set the service principal as subscription reader, there are other permission to set?

dmalagoli-dylog avatar Apr 09 '25 11:04 dmalagoli-dylog

Hi @dmalagoli-dylog , looks like your account doesn't have enough permissions to get locations list. For debug please log in to Azure cli(https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) using your service credentials and execute command: az account list-locations and attach an error message.

stanfra avatar Apr 10 '25 05:04 stanfra

That's the tricky part: it seems to be ok with locations. Both my account and the app seems able to get the list of locations.

I logged with the same service principal that was set in the overlay file, using the command

az login --service-principal -u <APP_ID> -p <CLIENT_SECRET> --tenant <TENANT_ID>

I checked that the login was correct with

az account show

then submitted

az account list-locations

Image

and not a single error is shown

Image

dmalagoli-dylog avatar Apr 10 '25 07:04 dmalagoli-dylog

Let's add some logs to localize the problem, please change https://github.com/hystax/optscale/blob/54804453a8b19b218a025a31b2429adf720d0997/tools/cloud_adapter/clouds/azure.py#L1325 in your repo to:

except Exception as exc:
            LOG.info('Cannot retrieve the list of regions for %s cloud account, error: %s',
                     self.cloud_account_id, str(exc))

After that execute command in optscale folder: ./build.sh insider --use-nerdctl to build insider service images locally. after the images are built delete insider-api pod using command: kubectl delete pod insider-api-<hash> After restarting pod wait until report import start again and find insider log in kibana with text: Cannot retrieve the list of regions , now it should contain error which describe problem.

stanfra avatar Apr 10 '25 10:04 stanfra