cloudstack
cloudstack copied to clipboard
Usage charging deleted volumes
ISSUE TYPE
- Bug Report
COMPONENT NAME
usage
CLOUDSTACK VERSION
4.16 and 4.17
CONFIGURATION
N/A
OS / ENVIRONMENT
centos 7
SUMMARY
Usage never set volume as deleted and when volume is resized, it is charge twice (or more)
STEPS TO REPRODUCE
Cenario 1:
Create a volume and attach it on vm.
Remove volume.
It will be charged forever
Cenario 2:
Create a volume and attach it on vm.
Resive volume.
It will be charged for many times that you resize it
EXPECTED RESULTS
Stop to charge volume when it is removed.
ACTUAL RESULTS
Volume is being charged when it is removed or being charged more than 24 hours.
# > list usagerecords domainid=8e1c4859-9cf4-4414-a218-160b05b9f157 accountid=3e64effb-12a6-4642-95f9-c217c804a3c4 type=6 startdate=2020-01-19 enddate=2020-01-19
{
"account": "21150",
"accountid": "3e64effb-12a6-4642-95f9-c217c804a3c4",
"description": "Volume usage for DATA-600 (40f8412b-65f5-4c2d-a6e3-00a4e0456279) with disk offering Disco Magnético (34573d38-2dce-4d8b-a0b4-80ad61b1cbc8) and size (1000.00 GB) 1073741824000",
"domain": "skycloud.prv",
"domainid": "8e1c4859-9cf4-4414-a218-160b05b9f157",
"enddate": "2020-01-19'T'23:59:59-03:00",
"offeringid": "34573d38-2dce-4d8b-a0b4-80ad61b1cbc8",
"rawusage": "48",
"size": 1073741824000,
"startdate": "2020-01-19'T'00:00:00-03:00",
"tags": [],
"usage": "48 Hrs",
"usageid": "40f8412b-65f5-4c2d-a6e3-00a4e0456279",
"usagetype": 6,
"zoneid": "74c526f8-52a2-4e2e-8d08-76eae35913a4"
},
@matheusfontes , in your scenario 1:
Cenario 1:
Create a volume and attach it on vm.
Remove volume.
It will be charged forever
the volume is supposed to be continually charged for, if it is only removed from the VM. Do you mean it is deleted and expunged and than still charged for?
Scenario 2 seems obviously wrong, I'll look into that.
@DaanHoogland volumes expunged are always beeing charged. I saw some queries in database that I think they are wrong. When usage job is started and it's looking for a VOLUME.DELETE event it tries to search this volume in usage_volume table with that query: SELECT usage_volume.id, usage_volume.zone_id, usage_volume.account_id, usage_volume.domain_id, usage_volume.volume_id, usage_volume.disk_offering_id, usage_volume.template_id, usage_volume.size, usage_volume.created, usage_volume.deleted FROM usage_volume WHERE usage_volume.account_id = 9 AND usage_volume.id = 110 AND usage_volume.deleted IS NULL
I think this usage_volume.id search criteria is wrong, it should search for a usage_volume.volume_id field.
Apparently these 2 lines in UsageManagerImpl.java solves the usage volumes problem. But I think there is a problem with firewall rules and lb usage also. Ps: Tested on 4.17.0.1 version
Can you submit a PR with those lines @matheusfontes ?
never mind @matheusfontes , found a few secs to do it: #6737
@DaanHoogland now I have sure that the problem is bigger than only resized/deleted volumes. usage is charging a deleted vrouters networks transfer, deleted firewall rules and load balance. I need to open other issue? Everyone are experiencing these charges problems? The problem started on 4.16, so I think everyone that uses usage to billing clients are billing them wrong and this is need a urgent fix, don't it?
@matheusfontes I think that if these are fixed by separate changes they are seperate issues. If we piut the fix in one patch we can just rename the issue.
@matheusfontes did you test #6737 ? and do you know were to look for additional issues?
@matheusfontes I'm not a heavy user of the usage server or billing in general. I will need your help in closing this issue please. can you test and approve of #6737 and advice on any other issues there may be, please?
@DaanHoogland i just stumbled across the same issue, still persisting in 4.18.1.0. Following the PR, this should have been merged in 4.18.0, correct?
Volumes, which have been destroyed/expunged are still reported with active usage records forever, we might need to have a second look at it.
yes it should, @StepBee . can you add more info? is this production or reproducible in a test/lab environment?
@DaanHoogland sure, this is the usage output of one day for usagetype 6 (volumes) from a domain, using only 1 volume since a long time already:
(cloudstack) 🐱 > list usagerecords domainid=16644085-7eff-46dd-8eb0-ef27b0b1621e type=6 startdate=2024-01-08 enddate=2024-01-08
{
"count": 4,
"usagerecord": [
{
"account": "XXX_Admins",
"accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
"description": "Volume usage for ROOT-26854 (06ceea59-c8f8-4acd-902e-414d4bd60a5a) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
"domain": "XXXXX",
"domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
"enddate": "2024-01-08'T'23:59:59+00:00",
"rawusage": "24",
"size": 53687091200,
"startdate": "2024-01-08'T'00:00:00+00:00",
"tags": [],
"templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
"usage": "24 Hrs",
"usageid": "06ceea59-c8f8-4acd-902e-414d4bd60a5a",
"usagetype": 6,
"zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
},
{
"account": "XXX_Admins",
"accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
"description": "Volume usage for ROOT-26840 (f9f7daa6-3d98-496b-8fb2-698fae3b0c96) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
"domain": "XXXXX",
"domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
"enddate": "2024-01-08'T'23:59:59+00:00",
"rawusage": "24",
"size": 53687091200,
"startdate": "2024-01-08'T'00:00:00+00:00",
"tags": [],
"templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
"usage": "24 Hrs",
"usageid": "f9f7daa6-3d98-496b-8fb2-698fae3b0c96",
"usagetype": 6,
"zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
},
{
"account": "XXX_Admins",
"accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
"description": "Volume usage for ROOT-26842 (d8fb2834-f217-44dd-b5df-8a8573432668) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
"domain": "XXXXX",
"domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
"enddate": "2024-01-08'T'23:59:59+00:00",
"rawusage": "24",
"size": 53687091200,
"startdate": "2024-01-08'T'00:00:00+00:00",
"tags": [],
"templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
"usage": "24 Hrs",
"usageid": "d8fb2834-f217-44dd-b5df-8a8573432668",
"usagetype": 6,
"zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
},
{
"account": "XXX_Admins",
"accountid": "72c17fd1-da79-46cf-a45d-d3ba1b5ad6d7",
"description": "Volume usage for ROOT-26856 (14265d5a-d8fb-4136-af73-8ad0945a10a8) and template Ubuntu Server 22.04 LTS Cloud-Init (4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6) and size 53687091200",
"domain": "XXXXX",
"domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
"enddate": "2024-01-08'T'23:59:59+00:00",
"rawusage": "24",
"size": 53687091200,
"startdate": "2024-01-08'T'00:00:00+00:00",
"tags": [],
"templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
"usage": "24 Hrs",
"usageid": "14265d5a-d8fb-4136-af73-8ad0945a10a8",
"usagetype": 6,
"zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc"
}
]
}
In the usage output, usage for the following volumes are listed: Volume usage for ROOT-26854 Volume usage for ROOT-26840 Volume usage for ROOT-26842 Volume usage for ROOT-26856
The output of listing volumes from the same domain is listing only the last volume, ROOT-26856:
(cloudstack) 🐱 > list volumes domainid=16644085-7eff-46dd-8eb0-ef27b0b1621e listall=true
{
"count": 1,
"volume": [
{
"account": "XXX_Admins",
"created": "2023-01-27T14:34:31+0000",
"destroyed": false,
"deviceid": 0,
"diskioread": 17158,
"diskiowrite": 4018086,
"diskkbsread": 1198132,
"diskkbswrite": 42926744,
"displayvolume": true,
"domain": "XXXXX",
"domainid": "16644085-7eff-46dd-8eb0-ef27b0b1621e",
"hasannotations": false,
"hypervisor": "KVM",
"id": "14265d5a-d8fb-4136-af73-8ad0945a10a8",
"isextractable": false,
"name": "ROOT-26856",
"path": "14265d5a-d8fb-4136-af73-8ad0945a10a8",
"physicalsize": 53687091200,
"provisioningtype": "thin",
"quiescevm": false,
"serviceofferingdisplaytext": "Custom Instance",
"serviceofferingid": "64",
"serviceofferingname": "Custom",
"size": 53687091200,
"state": "Ready",
"storage": "ceph-performance",
"storageid": "d658f775-1c36-3f7f-afbe-c77991fec3f1",
"storagetype": "shared",
"supportsstoragesnapshot": false,
"tags": [],
"templatedisplaytext": "Ubuntu Server 22.04 LTS Cloud-Init",
"templateid": "4e67bcd1-6ecf-4d0d-99ff-ee3775cfc6b6",
"templatename": "Ubuntu Server 22.04 LTS Cloud-Init",
"type": "ROOT",
"utilization": "100.0%",
"virtualmachineid": "df71e1fd-e5c3-41b3-992e-20b9c067a4cd",
"virtualsize": 53687091200,
"vmdisplayname": "XXXXXX",
"vmname": "XXXXXX",
"vmstate": "Running",
"vmtype": "User",
"zoneid": "f2665eef-1073-4c50-8d3a-076d44036fcc",
"zonename": "Berlin-01"
}
]
}
Looking at the database reveals the other three volumes, mentioned in the usage output, are already expunged since a very long time in the past and only one volume (the last one) is ready and should be part of the usage output:
id name uuid created updated removed state size
26888 ROOT-26840 f9f7daa6-3d98-496b-8fb2-698fae3b0c96 27.01.23 13:06 27.01.23 13:09 27.01.23 13:09 Expunged 53687091200
26890 ROOT-26842 d8fb2834-f217-44dd-b5df-8a8573432668 27.01.23 13:12 27.01.23 14:27 27.01.23 14:27 Expunged 53687091200
26902 ROOT-26854 06ceea59-c8f8-4acd-902e-414d4bd60a5a 27.01.23 14:30 27.01.23 14:33 27.01.23 14:33 Expunged 53687091200
26904 ROOT-26856 14265d5a-d8fb-4136-af73-8ad0945a10a8 27.01.23 14:34 19.12.23 12:49 \N Ready 53687091200
This is from our production environment and i am able to reproduce it in our test environment.
This is from our production environment and i am able to reproduce it in our test environment.
please describe the reproduction scheme in a clean (new) environment and I'll try and fix it.
I actually found a reference to what looks like the same issue on the cloudstack user mailinglist from 2018 https://lists.apache.org/thread/vb9v6ys0p0tr0wnzgt0oxdbjjxykbtk2
While trying to replicate the issue with new volumes, it's not as straight forward as i thought, like creating a datadisk, attaching it and deleting it to reproduce the issue - unfortunately it seems it's not that simple.
For some expunged volumes from the past i see eternal usage data, for some not - i'm trying to understand differences between both behaviors.
Maybe someone else with a long time running environment with activated usage service can pick a less-used domain (for better overview), generate a usage report for type 6 (volumes) and compare if the report includes expunged disks as well?
list usagerecords domainid=<domain-uuid> type=6 startdate=<startdate> enddate=<same-as-startdate> filter=description,rawusage
ok @StepBee , keep us u[pdated. cc @rajujith , didn't you deal with a similar thing recently? do you know how to reproduce?
For the moment, i see for all affected volumes one field in the database is NULL, which is not the case for not affected volumes. Affected volumes:
cloud_usage.usage_volume.deleted = NULL
for not affected volumes:
cloud_usage.usage_volume.deleted = <date-of-deletion>
Following your question to the issue from 2018 and following the usage aggregation logic, for both types of volumes, affected and not affected, i see two events in the database table "cloud.usage_event"
- one entry with type "VOLUME.CREATE"
- one entry with type "VOLUME.DELETE"
The entries in the "cloud.usage_event" table are as expected - and the same applies to the copy in cloud_usage.usage_event table.
So it looks like something (sometimes) is missing the VOLUME.DELETE event during the first aggregation to the helper table "cloud_usage.usage_volume"
I'm trying to think of a quick-fix - is there a way to regenerate the usage data once i updated all the cloud_usage.usage_volume.deleted columns with the correct date? The API for "generateUsageRecords" will only generate records for previously failed generations, which is not the case here.