cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Usage charging deleted volumes

Open matheusfontes opened this issue 2 years ago • 9 comments

ISSUE TYPE
  • Bug Report
COMPONENT NAME
usage
CLOUDSTACK VERSION
4.16 and 4.17
CONFIGURATION

N/A

OS / ENVIRONMENT

centos 7

SUMMARY

Usage never set volume as deleted and when volume is resized, it is charge twice (or more)

STEPS TO REPRODUCE
Cenario 1:
Create a volume and attach it on vm.
Remove volume.
It will be charged forever

Cenario 2:
Create a volume and attach it on vm.
Resive volume.
It will be charged for many times that you resize it
EXPECTED RESULTS
Stop to charge volume when it is removed.
ACTUAL RESULTS
Volume is being charged when it is removed or being charged more than 24 hours.

# > list usagerecords domainid=8e1c4859-9cf4-4414-a218-160b05b9f157 accountid=3e64effb-12a6-4642-95f9-c217c804a3c4 type=6 startdate=2020-01-19 enddate=2020-01-19
    {
      "account": "21150",
      "accountid": "3e64effb-12a6-4642-95f9-c217c804a3c4",
      "description": "Volume usage for DATA-600 (40f8412b-65f5-4c2d-a6e3-00a4e0456279) with disk offering Disco Magnético (34573d38-2dce-4d8b-a0b4-80ad61b1cbc8) and size (1000.00 GB) 1073741824000",
      "domain": "skycloud.prv",
      "domainid": "8e1c4859-9cf4-4414-a218-160b05b9f157",
      "enddate": "2020-01-19'T'23:59:59-03:00",
      "offeringid": "34573d38-2dce-4d8b-a0b4-80ad61b1cbc8",
      "rawusage": "48",
      "size": 1073741824000,
      "startdate": "2020-01-19'T'00:00:00-03:00",
      "tags": [],
      "usage": "48 Hrs",
      "usageid": "40f8412b-65f5-4c2d-a6e3-00a4e0456279",
      "usagetype": 6,
      "zoneid": "74c526f8-52a2-4e2e-8d08-76eae35913a4"
    },

matheusfontes avatar Sep 08 '22 19:09 matheusfontes

@matheusfontes , in your scenario 1:

Cenario 1:
Create a volume and attach it on vm.
Remove volume.
It will be charged forever

the volume is supposed to be continually charged for, if it is only removed from the VM. Do you mean it is deleted and expunged and than still charged for?

Scenario 2 seems obviously wrong, I'll look into that.

DaanHoogland avatar Sep 12 '22 13:09 DaanHoogland

@DaanHoogland volumes expunged are always beeing charged. I saw some queries in database that I think they are wrong. When usage job is started and it's looking for a VOLUME.DELETE event it tries to search this volume in usage_volume table with that query: SELECT usage_volume.id, usage_volume.zone_id, usage_volume.account_id, usage_volume.domain_id, usage_volume.volume_id, usage_volume.disk_offering_id, usage_volume.template_id, usage_volume.size, usage_volume.created, usage_volume.deleted FROM usage_volume WHERE usage_volume.account_id = 9 AND usage_volume.id = 110 AND usage_volume.deleted IS NULL

I think this usage_volume.id search criteria is wrong, it should search for a usage_volume.volume_id field.

matheusfontes avatar Sep 12 '22 14:09 matheusfontes

Apparently these 2 lines in UsageManagerImpl.java solves the usage volumes problem. But I think there is a problem with firewall rules and lb usage also. Ps: Tested on 4.17.0.1 version

UsageManagerImpl.txt

matheusfontes avatar Sep 13 '22 21:09 matheusfontes

Can you submit a PR with those lines @matheusfontes ?

DaanHoogland avatar Sep 14 '22 07:09 DaanHoogland

never mind @matheusfontes , found a few secs to do it: #6737

DaanHoogland avatar Sep 14 '22 08:09 DaanHoogland

@DaanHoogland now I have sure that the problem is bigger than only resized/deleted volumes. usage is charging a deleted vrouters networks transfer, deleted firewall rules and load balance. I need to open other issue? Everyone are experiencing these charges problems? The problem started on 4.16, so I think everyone that uses usage to billing clients are billing them wrong and this is need a urgent fix, don't it?

matheusfontes avatar Sep 21 '22 13:09 matheusfontes

@matheusfontes I think that if these are fixed by separate changes they are seperate issues. If we piut the fix in one patch we can just rename the issue.

DaanHoogland avatar Sep 21 '22 13:09 DaanHoogland

@matheusfontes did you test #6737 ? and do you know were to look for additional issues?

DaanHoogland avatar Sep 22 '22 07:09 DaanHoogland

@matheusfontes I'm not a heavy user of the usage server or billing in general. I will need your help in closing this issue please. can you test and approve of #6737 and advice on any other issues there may be, please?

DaanHoogland avatar Sep 23 '22 12:09 DaanHoogland

@DaanHoogland i just stumbled across the same issue, still persisting in 4.18.1.0. Following the PR, this should have been merged in 4.18.0, correct?

Volumes, which have been destroyed/expunged are still reported with active usage records forever, we might need to have a second look at it.

StepBee avatar Jan 10 '24 14:01 StepBee

yes it should, @StepBee . can you add more info? is this production or reproducible in a test/lab environment?

DaanHoogland avatar Jan 10 '24 14:01 DaanHoogland

@DaanHoogland sure, this is the usage output of one day for usagetype 6 (volumes) from a domain, using only 1 volume since a long time already:

(cloudstack) 🐱 > list usagerecords domainid=16644085-7eff-46dd-8eb0-ef27b0b1621e type=6 startdate=2024-01-08 enddate=2024-01-08
{
  "count": 4,
  "usagerecord": [
    {
      "account": "XXX_Admins",
      "accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
      "description": "Volume usage for ROOT-26854 (06ceea59-c8f8-4acd-902e-414d4bd60a5a) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
      "domain": "XXXXX",
      "domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
      "enddate": "2024-01-08'T'23:59:59+00:00",
      "rawusage": "24",
      "size": 53687091200,
      "startdate": "2024-01-08'T'00:00:00+00:00",
      "tags": [],
      "templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
      "usage": "24 Hrs",
      "usageid": "06ceea59-c8f8-4acd-902e-414d4bd60a5a",
      "usagetype": 6,
      "zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
    },
    {
      "account": "XXX_Admins",
      "accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
      "description": "Volume usage for ROOT-26840 (f9f7daa6-3d98-496b-8fb2-698fae3b0c96) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
      "domain": "XXXXX",
      "domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
      "enddate": "2024-01-08'T'23:59:59+00:00",
      "rawusage": "24",
      "size": 53687091200,
      "startdate": "2024-01-08'T'00:00:00+00:00",
      "tags": [],
      "templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
      "usage": "24 Hrs",
      "usageid": "f9f7daa6-3d98-496b-8fb2-698fae3b0c96",
      "usagetype": 6,
      "zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
    },
    {
      "account": "XXX_Admins",
      "accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
      "description": "Volume usage for ROOT-26842 (d8fb2834-f217-44dd-b5df-8a8573432668) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
      "domain": "XXXXX",
      "domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
      "enddate": "2024-01-08'T'23:59:59+00:00",
      "rawusage": "24",
      "size": 53687091200,
      "startdate": "2024-01-08'T'00:00:00+00:00",
      "tags": [],
      "templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
      "usage": "24 Hrs",
      "usageid": "d8fb2834-f217-44dd-b5df-8a8573432668",
      "usagetype": 6,
      "zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
    },
    {
      "account": "XXX_Admins",
      "accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
      "description": "Volume usage for ROOT-26856 (14265d5a-d8fb-4136-af73-8ad0945a10a8) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
      "domain": "XXXXX",
      "domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
      "enddate": "2024-01-08'T'23:59:59+00:00",
      "rawusage": "24",
      "size": 53687091200,
      "startdate": "2024-01-08'T'00:00:00+00:00",
      "tags": [],
      "templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
      "usage": "24 Hrs",
      "usageid": "14265d5a-d8fb-4136-af73-8ad0945a10a8",
      "usagetype": 6,
      "zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
    }
  ]
}

In the usage output, usage for the following volumes are listed: Volume usage for ROOT-26854 Volume usage for ROOT-26840 Volume usage for ROOT-26842 Volume usage for ROOT-26856

The output of listing volumes from the same domain is listing only the last volume, ROOT-26856:

(cloudstack) 🐱 > list volumes domainid=16644085-7eff-46dd-8eb0-ef27b0b1621e listall=true
{
  "count": 1,
  "volume": [
    {
      "account": "XXX_Admins",
      "created": "2023-01-27T14:34:31+0000",
      "destroyed": false,
      "deviceid": 0,
      "diskioread": 17158,
      "diskiowrite": 4018086,
      "diskkbsread": 1198132,
      "diskkbswrite": 42926744,
      "displayvolume": true,
      "domain": "XXXXX",
      "domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
      "hasannotations": false,
      "hypervisor": "KVM",
      "id": "14265d5a-d8fb-4136-af73-8ad0945a10a8",
      "isextractable": false,
      "name": "ROOT-26856",
      "path": "14265d5a-d8fb-4136-af73-8ad0945a10a8",
      "physicalsize": 53687091200,
      "provisioningtype": "thin",
      "quiescevm": false,
      "serviceofferingdisplaytext": "Custom Instance",
      "serviceofferingid": "64",
      "serviceofferingname": "Custom",
      "size": 53687091200,
      "state": "Ready",
      "storage": "ceph-performance",
      "storageid": "d658f775-1c36-3f7f-afbe-c77991fec3f1",
      "storagetype": "shared",
      "supportsstoragesnapshot": false,
      "tags": [],
      "templatedisplaytext": "Ubuntu Server 22.04 LTS Cloud-Init",
      "templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
      "templatename": "Ubuntu Server 22.04 LTS Cloud-Init",
      "type": "ROOT",
      "utilization": "100.0%",
      "virtualmachineid": "df71e1fd-e5c3-41b3-992e-20b9c067a4cd",
      "virtualsize": 53687091200,
      "vmdisplayname": "XXXXXX",
      "vmname": "XXXXXX",
      "vmstate": "Running",
      "vmtype": "User",
      "zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc",
      "zonename": "Berlin-01"
    }
  ]
}

Looking at the database reveals the other three volumes, mentioned in the usage output, are already expunged since a very long time in the past and only one volume (the last one) is ready and should be part of the usage output:

id	name		uuid					created		updated		removed		state		size
26888	ROOT-26840	f9f7daa6-3d98-496b-8fb2-698fae3b0c96	27.01.23 13:06	27.01.23 13:09	27.01.23 13:09	Expunged	53687091200
26890	ROOT-26842	d8fb2834-f217-44dd-b5df-8a8573432668	27.01.23 13:12	27.01.23 14:27	27.01.23 14:27	Expunged	53687091200
26902	ROOT-26854	06ceea59-c8f8-4acd-902e-414d4bd60a5a	27.01.23 14:30	27.01.23 14:33	27.01.23 14:33	Expunged	53687091200
26904	ROOT-26856	14265d5a-d8fb-4136-af73-8ad0945a10a8	27.01.23 14:34	19.12.23 12:49	\N		Ready		53687091200

This is from our production environment and i am able to reproduce it in our test environment.

StepBee avatar Jan 10 '24 14:01 StepBee

This is from our production environment and i am able to reproduce it in our test environment.

please describe the reproduction scheme in a clean (new) environment and I'll try and fix it.

DaanHoogland avatar Jan 10 '24 15:01 DaanHoogland

I actually found a reference to what looks like the same issue on the cloudstack user mailinglist from 2018 https://lists.apache.org/thread/vb9v6ys0p0tr0wnzgt0oxdbjjxykbtk2

While trying to replicate the issue with new volumes, it's not as straight forward as i thought, like creating a datadisk, attaching it and deleting it to reproduce the issue - unfortunately it seems it's not that simple.

For some expunged volumes from the past i see eternal usage data, for some not - i'm trying to understand differences between both behaviors.

Maybe someone else with a long time running environment with activated usage service can pick a less-used domain (for better overview), generate a usage report for type 6 (volumes) and compare if the report includes expunged disks as well?

list usagerecords domainid=<domain-uuid> type=6 startdate=<startdate> enddate=<same-as-startdate> filter=description,rawusage

StepBee avatar Jan 11 '24 14:01 StepBee

ok @StepBee , keep us u[pdated. cc @rajujith , didn't you deal with a similar thing recently? do you know how to reproduce?

DaanHoogland avatar Jan 11 '24 14:01 DaanHoogland

For the moment, i see for all affected volumes one field in the database is NULL, which is not the case for not affected volumes. Affected volumes:

cloud_usage.usage_volume.deleted = NULL

for not affected volumes:

cloud_usage.usage_volume.deleted = <date-of-deletion>

Following your question to the issue from 2018 and following the usage aggregation logic, for both types of volumes, affected and not affected, i see two events in the database table "cloud.usage_event"

  • one entry with type "VOLUME.CREATE"
  • one entry with type "VOLUME.DELETE"

The entries in the "cloud.usage_event" table are as expected - and the same applies to the copy in cloud_usage.usage_event table.

So it looks like something (sometimes) is missing the VOLUME.DELETE event during the first aggregation to the helper table "cloud_usage.usage_volume"

I'm trying to think of a quick-fix - is there a way to regenerate the usage data once i updated all the cloud_usage.usage_volume.deleted columns with the correct date? The API for "generateUsageRecords" will only generate records for previously failed generations, which is not the case here.

StepBee avatar Jan 11 '24 20:01 StepBee