ice icon indicating copy to clipboard operation
ice copied to clipboard

Ice totals off

Open ckelner opened this issue 10 years ago • 40 comments

Hello, our Ice total is off by a large amount when totaled over the month, for individual days it is obviously much smaller (1/30th or thereabouts). I can't share the exact amount, but suffice to say it is large enough that it is not easily missed.

screen shot 2015-02-03 at 1 07 34 pm

I wrote a dumb py script (see https://gist.github.com/ckelner/962b9e52db11fc73cf68#file-aws_csv_totaler-py) to total our CSV by LineItem (our CSV is 17million lines and about 7Gb, so it is hard to handle any other way). The value it spits out is equal to the value AWS shows in the billing console. The totals at the bottom of the CSV also line up w/ the AWS billing console and the py script.

My colleague, @AlexHawk31 has done some deep dive comparisons on the numbers (she's really good at that), and it looks like charges for unused reserved instances aren't showing up for us. Every account that has unused RIs is off by some value.

To provide some more context, that may be helpful, we have 70+ AWS accounts under a single master account. We also have several hundred RI purchases (not in this month, just overall). We also added two new accounts under the master account to consolidated billing this month.

Additionally we also run more than one processor for redundancy, uptime, etc, as seen in https://github.com/Netflix/ice/issues/141. Not sure if that matters or not (attempting to figure that out in issue 141).

I've seen a lot of fixes which involve "dumping local ice files and ice s3 files" -- I'd like to avoid doing that and rather find the root of the problem. I'm happy to make code changes, but I wanted to come here first and start a conversation about it.

ckelner avatar Feb 03 '15 18:02 ckelner

Based on feedback in issue #141 I believe this may be resolved by removing multiple processors accessing the same bucket. I am going to continue to monitor after the solution in #141 is implemented. Will re-open if necessary.

ckelner avatar Feb 03 '15 20:02 ckelner

For posterity sake, this is where we saw the inconsistency: screen shot 2015-02-03 at 1 46 14 pm

ckelner avatar Feb 03 '15 20:02 ckelner

Re-opening this. I setup a new ice instance, single processor, single reader, completely new S3 bucket (empty) and I am still seeing the issue described above.

ckelner avatar Feb 04 '15 19:02 ckelner

Is it possible for you to share your ice.properties file so that we know precisely under what configuration your Ice instances are running in order to provide help ?

nfonrose avatar Feb 05 '15 08:02 nfonrose

Hi @nfonrose, sure thing, here's the processor, the reader is essentially the same, just flip the flags as I'm sure you are aware. My only concern may be with the "reservation owner accounts" -- primarily because the documentation/readme isn't very clear. I've redacted bucket names, but other than that everything is as it exists on our deployments.

ice.processor=true
# whether or not to start reader/UI
ice.reader=false

# whether or not to start reservation capacity poller
ice.reservationCapacityPoller=true

# default reservation period, possible values are oneyear, threeyear
ice.reservationPeriod=threeyear
# default reservation utilization, possible values are LIGHT, MEDIUM, HEAVY. If you have both (LIGHT or MEDIUM) and HEAVY RIs, make sure you do not put HEAVY here.
ice.reservationUtilization=HEAVY

# the highstock url; host it somewhere else and change this if you need HTTPS
ice.highstockUrl=/ice/js/highstock.js

# url prefix, e.g. http://ice.netflix.com/. Will be used in alert emails.
# @ckelner: TODO
ice.urlPrefix=

# from email address
ice.fromEmail=

# ec2 ondemand hourly cost threshold to send alert email. The alert email will be sent at most once per day.
ice.ondemandCostAlertThreshold=250

# ec2 ondemand hourly cost alert emails, separated by ","
ice.ondemandCostAlertEmails=

# modify the following 5 properties according to your billing files configuration. if you have multiple payer accounts, you will need to specify multiple values for each property.
# s3 bucket name where the billing files are. multiple bucket names are delimited by ",". Ice must have read access to billing s3 bucket.
ice.billing_s3bucketname=<redacted>
# prefix of the billing files. multiple prefixes are delimited by ","
ice.billing_s3bucketprefix=
# specify your payer account id here if across-accounts IAM role access is used. multiple account ids are delimited by ",". "ice.billing_payerAccountId=,222222222222" means assumed role access is only used for the second bucket.
ice.billing_payerAccountId=<redacted>
# specify the assumed role name here if you use IAM role access to read from billing s3 bucket. multiple role names are delimited by ",". "ice.billing_accessRoleName=,ice" means assumed role access is only used for the second bucket.
ice.billing_accessRoleName=ice
# specify external id here if it is used. multiple external ids are delimited by ",". if you don't use external id, you can leave this property unset.
#ice.billing_accessExternalId=

# specify your custom tags here. Multiple tags are delimited by ",". If specified, BasicResourceService will be used to generate resource groups for you.
# PLEASE MAKE SURE you have limited number (e.g. < 100) of unique value combinations from your custom tags, otherwise Ice performance will be greatly affected.
#ice.customTags=user:Billing

# start date in millis from when you want to start processing the billing files
ice.startmillis=1388534400000

# you company name. it will be used by UI
ice.companyName=The Weather Company

# s3 bucket name where Ice can store output files. Ice must have read and write access to billing s3 bucket.
ice.work_s3bucketname=<redacted>
# prefix of Ice output files
ice.work_s3bucketprefix=

# local directory for Ice processor. the directory must exist.
ice.processor.localDir=/opt/ice_output/processor

# local directory for Ice reader. the directory must exist.
ice.reader.localDir=/opt/ice_output/reader

# monthly data cache size for Ice reader.
ice.monthlycachesize=24

# change the follow account settings
<redacted>

# set reservation owner accounts. "ice.owneraccount.account2=account3,account4" means reservations in account2 can be shared by account3 and account4
# if reservation capacity poller is enabled, the poller will try to poll reservation capacity through ec2 API (desribeReservedInstances) for each reservation owner account.
<redacted>
#ice.owneraccount.account2=account3,account4

# if reservation capacity poller needs to use IAM role to access ec2 API, set the assumed role here for each reservation owner account
ice.owneraccount.twc-master.role=ice
#ice.owneraccount.account2.role=

# if reservation capacity poller needs to use IAM role to access ec2 API and external id is used, set the external id here for each reservation owner account. otherwise you can leave it unset.
#ice.owneraccount.account1.externalId=
#ice.owneraccount.account2.externalId=

ckelner avatar Feb 05 '15 13:02 ckelner

My guess it has something to do with this, not sure how I missed this, but looking into it now...

"If different accounts have different AZ mappings, you will also need to subclass BasicAccountService and override method getAccountMappedZone to provide correct AZ mapping."

ckelner avatar Feb 09 '15 21:02 ckelner

So I'm looking for getAccountMappedZone() in BasicAccountService and I'm coming up empty? Can anyone help me out here?

ckelner avatar Feb 09 '15 21:02 ckelner

Asked about documentation error in new issue, see #144.

ckelner avatar Feb 11 '15 14:02 ckelner

Looking at https://github.com/Netflix/ice/blob/master/src/java/sample.properties#L75-L79

# set reservation owner accounts. "ice.owneraccount.account2=account3,account4" means reservations in account2 can be shared by account3 and account4
# if reservation capacity poller is enabled, the poller will try to poll reservation capacity through ec2 API (desribeReservedInstances) for each reservation owner account.
ice.owneraccount.account1=
ice.owneraccount.account2=account3,account4
ice.owneraccount.account5=account6

it seems that we should have every account referencing every other account. Which seems artificially arduous. I am going to attempt it and see what happens.

Then looking at https://github.com/Netflix/ice/blob/master/src/java/sample.properties#L81-L84

# if reservation capacity poller needs to use IAM role to access ec2 API, set the assumed role here for each reservation owner account
ice.owneraccount.account1.role=ice
ice.owneraccount.account2.role=ice
ice.owneraccount.account5.role=ice

it appears that we should have a role in every account for ice to access? Again, seems cumbersome...

ckelner avatar Feb 11 '15 14:02 ckelner

This could be related to blended vs undblended cost discrepancy. Are totals in Ice lower than your AWS bill? Can you spot-check the usage numbers for "ec2_instance" product?

vfilanovsky avatar Feb 11 '15 18:02 vfilanovsky

Hey @vfilanovsky,

Yes our totals in Ice are lower than our AWS bill.

The usage numbers across the majority of accounts (70+ accounts) seems to be coming in lower than expected. In some cases, much lower, in others only small amounts, mostly depending on the size of the account (so relative to the number of machines which are running).

ckelner avatar Feb 11 '15 18:02 ckelner

Regarding owner accounts, I've tried the following but it doesn't seem to have changing anything for us, I've tried it both as account numbers and as the naming convention defined earlier in the config.

ckelner avatar Feb 11 '15 19:02 ckelner

@vfilanovsky to further clarify, @AlexHawk31 has done a deep dive on the numbers, and IIRC the hours are off when it comes to reserved instances.

ckelner avatar Feb 11 '15 19:02 ckelner

You gotta narrow it down. Pick an account, region, and an instance type and try to reconcile the usage numbers.

vfilanovsky avatar Feb 11 '15 20:02 vfilanovsky

@vfilanovsky We have looked at individual accounts, across the board, that's pretty straight forward. Of all the ones we've picked, the usage numbers are off. I'm not sure what more I can do from there? Are you looking for something specific that will help you?

ckelner avatar Feb 11 '15 20:02 ckelner

Chris, I am suggesting you try narrowing it down to a specific account, region AZ and instance type - then you can match up the reservations and AWS-reported usage to Ice-reported usage. Find the usage discrepancy first, then go after cost. I hope you understand that my options for troubleshooting your data issue are rather limited.

vfilanovsky avatar Feb 12 '15 02:02 vfilanovsky

@vfilanovsky Thanks for the clarification. I absolutely understand the limited nature of the situation :).

I didn't go after a specific instance type (yet), but I did drill down to specific account, region, and az. I'll see if I can nail a certain instance type and zone. I picked an account that I knew had reserved instances and was only running in a single zone; I'm not certain if it is the best one to use as an example though.

Here are my findings thus far:

In ice it reports usage as:

OndemandInstances   1,261.00 
ReservedInstancesHeavy  24,469.00

screen shot 2015-02-11 at 7 46 35 pm

I'm not sure how far I can trust the AWS Billing Console since I think the way it displays the data is different from the csv. Since I can't manipulate the full csv (17million lines, 6gb in side), I had to trim it down to a single account with the following commands:

grep -E '123456' ~/kelner_temp/98765-aws-billing-detailed-line-items-with-resources-and-tags-2015-01.csv > account.csv

where 123456 is the account number we are after then:

grep -E 'RunInstances' account.csv > account_instances.csv

Then I manually removed data transfer costs by sorting the data in the sheet. I found the following:

total usage reserved:  25618  
total usage on demand:  1261  

screen shot 2015-02-11 at 9 01 08 pm

The formula used for that is: =SUMIFS(M2:M25744, I2:I25744, "Y") and =SUMIFS(M2:M25744, I2:I25744, "N")

I did notice these sort of weird "no zone" entries in the sheet though: screen shot 2015-02-11 at 9 02 19 pm screen shot 2015-02-11 at 9 02 47 pm

And then this weird "RunInstance:0002" sort of thing under operation? screen shot 2015-02-11 at 9 03 45 pm

I wonder if either of those latter weird entries is causing problems?

I'll continue to dig down into each instance type.

ckelner avatar Feb 12 '15 02:02 ckelner

Additionally there is are about 5,000 lines that simply look like this: screen shot 2015-02-11 at 9 48 45 pm

Where normally it seems that the "Usage Type" column contains the machine type in question, and in these ~5000 lines there is no machine type.

ckelner avatar Feb 12 '15 02:02 ckelner

I figured out what the "HeavyUsage" without the machine type is, seems you have to look in description column for those: screen shot 2015-02-11 at 10 46 26 pm

And the empty AZ entries screen shot 2015-02-11 at 10 54 13 pm I'm not quite so clear on, but more info in the description field:

Getting the unique "usageType" from the csv, I used the advanced filter in excel, so I end up with:

BoxUsage:m1.large
BoxUsage:t2.micro
BoxUsage:t2.small
HeavyUsage:m1.small
HeavyUsage:c1.xlarge
HeavyUsage:m1.large
HeavyUsage:m1.xlarge
HeavyUsage:m2.2xlarge
HeavyUsage:r3.2xlarge
HeavyUsage:t2.micro

Totaling them up I get:

BoxUsage:m1.large           288
BoxUsage:t2.micro           539
BoxUsage:t2.small           434
HeavyUsage:m1.small         3720
HeavyUsage:c1.xlarge        744
HeavyUsage:m1.large         1944
HeavyUsage:m1.xlarge        13797
HeavyUsage:m2.2xlarge       4464
HeavyUsage:r3.2xlarge       744
HeavyUsage:t2.micro         205

Total                       26879

When compared to ice: screen shot 2015-02-11 at 10 41 22 pm

aggregated          25,730.00
c1.xlarge.windows   744.00
m1.large            1,488.00    
m1.large.windows    744.00
m1.small            3,720.00    
m1.xlarge           13,392.00   
m2.2xlarge          3,720.00    
r3.2xlarge          744.00  
t2.micro            744.00  
t2.small            434.00

To save some time I lumped together windows from ice, and heavy/on-demand from aws for the time being just to make things easier (I know the cost isn't the same, but just so I can see how the numbers add up):

ice  :  c1.xlarge           744
aws  :  c1.xlarge           744
ice  :  m1.large            2,232
aws  :  m1.large            2,232
ice  :  m1.small            3,720
aws  :  m1.small            3,720
ice  :  m1.xlarge           13,392      --- different
aws  :  m1.xlarge           13,797       --- different
ice  :  m2.2xlarge          3,720      --- different
aws  :  m2.2xlarge          4,464      --- different
ice  :  r3.2xlarge          744 
aws  :  r3.2xlarge          744
ice  :  t2.micro            744
aws  :  t2.micro            744
ice  :  t2.small            434
aws  :  t2.small            434

ckelner avatar Feb 12 '15 04:02 ckelner

Given the data above, it looks like the usage difference is coming from those columns which are missing availability zone. If I take those usage numbers and add them to the lower values in ice, I get the usage numbers that we expect: screen shot 2015-02-12 at 11 19 42 am

ckelner avatar Feb 12 '15 16:02 ckelner

I see now that the empty zone case above gets skipped here: https://github.com/Netflix/ice/blob/master/src/java/com/netflix/ice/basic/BasicLineItemProcessor.java#L299-L300

And I assume processed by the Reservation code later, however I'm having a hard time following the reservation code. I am still working through it, will update if I figure anything further out.

ckelner avatar Feb 12 '15 18:02 ckelner

I'm closing this. My totals are now an even greater percentage off (roughly 20% off now, where-as before they were ~5% off) now, however I have a tangible bug to follow as seen here: https://github.com/Netflix/ice/issues/147

I got to this point by having each of the 70+ accounts reference the other (as they are all under consolidated billing) in the owneraccount section of the config. Then I went to each of the 70+ accounts and added an IAM cross account access role for reservation polling and added each account and role mapping in the role section of the config.

These seem to have been the missing pieces for us I think. I believe resolving #147 will account for the 20% difference. If not I'll re-open this issue.

ckelner avatar Feb 16 '15 12:02 ckelner

Still looking into this (while chasing the RI issue), latest:

Some interesting instances that get skipped without region that aren't us-east-1 by default:

2015-02-20 03:16:39,254 [com.netflix.ice.processor.BillingFileProcessor] ERROR basic.BasicLineItemProcessor  - 1c) ignoring item: [Estimated, 013328811177, 216803239211, LineItem, Amazon Elastic Compute Cloud, 3207448, , , EU-HeavyUsage:c3.2xlarge, RunInstances, , Y, USD 0.1349 hourly fee per Linux/UNIX (Amazon VPC), c3.2xlarge instance (3360.0 hours purchased, 2275.00000000 hours used), 2015-02-01 00:00:00, 2015-02-28 23:59:59, 1085.00000000, , 453.260000, , 146.362500000000]
2015-02-20 03:16:39,254 [com.netflix.ice.processor.BillingFileProcessor] ERROR basic.BasicLineItemProcessor  - 1c) ignoring item: [Estimated, 013328811177, 216803239211, LineItem, Amazon Elastic Compute Cloud, 3207450, , , EU-HeavyUsage:c3.2xlarge, RunInstances, , Y, USD 0.1349 hourly fee per Linux/UNIX (Amazon VPC), c3.2xlarge instance (4032.0 hours purchased, 2730.00000000 hours used), 2015-02-01 00:00:00, 2015-02-28 23:59:59, 1302.00000000, , 543.920000, , 175.643000000000]
2015-02-20 03:16:39,255 [com.netflix.ice.processor.BillingFileProcessor] ERROR basic.BasicLineItemProcessor  - 1c) ignoring item: [Estimated, 013328811177, 682134577783, LineItem, Amazon Elastic Compute Cloud, 3207466, , , EU-HeavyUsage:c3.xlarge, RunInstances, , Y, USD 0.06745 hourly fee per Linux/UNIX (Amazon VPC), c3.xlarge instance (1344.0 hours purchased, 651.00000000 hours used), 2015-02-01 00:00:00, 2015-02-28 23:59:59, 693.00000000, , 90.650000, , 46.7400500000000]
2015-02-20 03:16:39,259 [com.netflix.ice.processor.BillingFileProcessor] ERROR basic.BasicLineItemProcessor  - 1c) ignoring item: [Estimated, 013328811177, 682134577783, LineItem, Amazon Elastic Compute Cloud, 2263576, , , USW2-HeavyUsage:m3.2xlarge, RunInstances, , Y, USD 0.12 hourly fee per Linux/UNIX (Amazon VPC), m3.2xlarge instance (672.0 hours purchased, 455.00000000 hours used), 2015-02-01 00:00:00, 2015-02-28 23:59:59, 217.00000000, , 80.640000, , 26.0400000000]
2015-02-20 03:16:39,261 [com.netflix.ice.processor.BillingFileProcessor] ERROR basic.BasicLineItemProcessor  - 1c) ignoring item: [Estimated, 013328811177, 682134577783, LineItem, Amazon Elastic Compute Cloud, 2263576, , , USW2-HeavyUsage:m3.2xlarge, RunInstances, , Y, USD 0.12 hourly fee per Linux/UNIX (Amazon VPC), m3.2xlarge instance (672.0 hours purchased, 455.00000000 hours used), 2015-02-01 00:00:00, 2015-02-28 23:59:59, 217.00000000, , 80.640000, , 26.040000

Doesn't appear to get picked up later at all:

2015-02-20 03:55:14,488 [com.netflix.ice.processor.BillingFileProcessor] INFO  processor.BillingFileProcessor  - cut hours to 671
2015-02-20 03:55:14,488 [com.netflix.ice.processor.BillingFileProcessor] INFO  processor.BillingFileProcessor  - Processing reservations...
2015-02-20 03:55:14,488 [com.netflix.ice.processor.BillingFileProcessor] INFO  processor.BillingFileProcessor  - Utilization LIGHT size is 0, skipping...
2015-02-20 03:55:14,488 [com.netflix.ice.processor.BillingFileProcessor] INFO  processor.BillingFileProcessor  - Processing reservations...
2015-02-20 03:55:14,488 [com.netflix.ice.processor.BillingFileProcessor] INFO  processor.BillingFileProcessor  - Utilization MEDIUM size is 0, skipping...
2015-02-20 03:55:14,488 [com.netflix.ice.processor.BillingFileProcessor] INFO  processor.BillingFileProcessor  - Processing reservations...
2015-02-20 03:55:14,488 [com.netflix.ice.processor.BillingFileProcessor] INFO  processor.BillingFileProcessor  - Utilization HEAVY size is 0, skipping...
2015-02-20 03:55:14,488 [com.netflix.ice.processor.BillingFileProcessor] INFO  processor.BillingFileProcessor  - Processing reservations...
2015-02-20 03:55:14,488 [com.netflix.ice.processor.BillingFileProcessor] INFO  processor.BillingFileProcessor  - Utilization FIXED size is 0, skipping...

ckelner avatar Feb 20 '15 13:02 ckelner

Lines with no AZ are the totals for the month. If you add them up, you'd be double-counting. That is why Ice ignores them.

vfilanovsky avatar Feb 20 '15 18:02 vfilanovsky

Hey @vfilanovsky thanks for the reply and that bit of info. I'm just hunting if you will. Trying to make sense of everything. I can't figure out where $XX,000 is going w/ the RI poller off, and when I turn on the RI poller it is off by $XXX,000 which is worse :(. I would hope that w/ the poller on it would be closer.

I dug back historically all the way to sept last year, and the amount it is off each month is different by quite a lot. For example, Sept was off in the five figures, while Nov was only off by 0.64 cents, and dec only in the four figures.

Anyways, I'm just dumping data here as I find it in hopes you can shine light on anything like you just did, but if that is problematic I can stop tracking my debugging attempts here.

ckelner avatar Feb 20 '15 19:02 ckelner

Hey @ckelner,

Anyways, I'm just dumping data here as I find it in hopes you can shine light on anything like you just did, but if that is problematic I can stop tracking my debugging attempts here.

I think your "data dumps" help other people helping you find the issue (since we can't reproduce locally your problem without your billing files). So I would encourage you to keep posting on your findings.

nfonrose avatar Feb 23 '15 10:02 nfonrose

More information, I've been in touch with AWS support regarding this, because it seems that our ECSV/DBR (as AWS calls it) is wrong if this logic is to be trusted.

Here's my most recent comms with AWS in chronological order, the last two communications are most important, citing specific problems found with the ECSV/DBR. TL;DR: When adding up every line item (including the "utilization" summarizations we've talked about in this GitHub issue), outside of ice, it equals our consolidated bill total. When excluding those values it falls short in the same way ice does.

The Weather Channel Feb 20, 2015 09:15 AM -0500

So I've raised this flag previously, but it was due to line items being in the us-east-1a zone. Now I am seeing several line items which are not in us-east-1 (per the UsageType column).

For example:

InvoiceID,PayerAccountId,LinkedAccountId,RecordType,RecordId,ProductName,RateId,SubscriptionId,PricingPlanId,UsageType,Operation,AvailabilityZone,ReservedInstance,ItemDescription,UsageStartDate,UsageEndDate,UsageQuantity,BlendedRate,BlendedCost,UnBlendedRate,UnBlendedCost,ResourceId,user:Billing,user:Owner
"Estimated","redacted","redacted","LineItem","400000000223405011","Amazon Elastic Compute Cloud","3207476",,,"EU-HeavyUsage:r3.large","RunInstances",,"Y","USD 0.0323 hourly fee per Linux/UNIX (Amazon VPC), r3.large instance (1488.0 hours purchased, 1488.0 hours used)","2015-01-01 00:00:00","2015-01-31 23:59:59","0",,"48.060000",,"0",,,
"Estimated","redacted","redacted","LineItem","400000000223404865","Amazon Elastic Compute Cloud","2321982",,,"EU-HeavyUsage:r3.large","RunInstances",,"Y","USD 0.034 hourly fee per Linux/UNIX (Amazon VPC), r3.large instance (2232.0 hours purchased, 2232.0 hours used)","2015-01-01 00:00:00","2015-01-31 23:59:59","0",,"75.890000",,"0.00200000000",,,
"Estimated","redacted","redacted","LineItem","400000000223404943","Amazon Elastic Compute Cloud","3207476",,,"EU-HeavyUsage:r3.large","RunInstances",,"Y","USD 0.0323 hourly fee per Linux/UNIX (Amazon VPC), r3.large instance (1488.0 hours purchased, 1488.0 hours used)","2015-01-01 00:00:00","2015-01-31 23:59:59","0",,"48.060000",,"0",,,
"Estimated","redacted","redacted","LineItem","400000000223404871","Amazon Elastic Compute Cloud","3373535",,,"EU-HeavyUsage:r3.large","RunInstances",,"Y","USD 0.0323 hourly fee per Linux/UNIX (Amazon VPC), r3.large instance (1488.0 hours purchased, 715.00000000 hours used)","2015-01-01 00:00:00","2015-01-31 23:59:59","773.00000000",,"48.060000",,"24.965500000000",,,
"Estimated","redacted","redacted","LineItem","400000000223404866","Amazon Elastic Compute Cloud","3259269",,,"EU-HeavyUsage:r3.large","RunInstances",,"Y","USD 0.0323 hourly fee per Linux/UNIX (Amazon VPC), r3.large instance (1488.0 hours purchased, 1488.0 hours used)","2015-01-01 00:00:00","2015-01-31 23:59:59","0",,"48.060000",,"0",,,

Can you explain this to me please? We need to be able to reconcile these costs outside of the AWS Billing dashboard.

Amazon Web Services Feb 20, 2015 01:27 PM -0500

I'm going to submit a feature request to have the AZs reported in this section of the DBR, but I want to clarify a few things as well to help reconcile these costs outside of the AWS Billing dashboard. As I'm sure you may be aware, this section is used to determine whether or not a Reserved Instance has been fully utilized throughout the month and does not come with associated charges. It will give you a basic snapshot of how many hours these RIs are being utilized throughout a given month (which is a good reason why it might be useful to include the AZs as well). To allocate costs, there are a few options you can use, rather than relying on the section you indicated:

  1. You can allocate the costs to the RI owner
  2. You can allocate hourly usage by using the unblended rates in the individual line items that are using your RIs

I will send this feature request to our EC2 team to see if they can include the AZ for this section of your DBR. Thank you for your patience and please let us know if you have any further questions.

The Weather Channel Feb 20, 2015 01:59 PM -0500

To be certain I want to further clarify what you've stated.

To me it sounds like these lines are duplicated charges that also appear as LineItems elsewhere. It sounds like they serve as a means to express underutilized reservations. Is that correct?

If that is correct, then can you tell me if there is a line item elsewhere in the CSV that charged 100% of the cost for a reservation month rather than an hourly cost? The majority of what we have at TWC are HEAVY 3yr which we get charged for regardless of usage. I need to be able to reconcile full cost.

Amazon Web Services Feb 20, 2015 04:28 PM -0500

The items that you included in your first email are basically a summary, including the utilization and the total cost at the bottom of your Detailed Billing Report. It is also broken out by the hour. In other words, both the line items and the summary are two ways of expressing your RI utilization. You were not charged twice, but the Detailed Billing Report includes both views to give you both a summary and a more granular view. In the individual line items, you'll be able to see which specific RI was used, which instance it was applied to and which account owned that instance. The summary will express how much the total RI cost, as well as how many hours it was in use throughout the month.

Regrading your second question, you should find that information within the line items that you provided. In the example that you provided, for an RI that was 100% utilized, you will see something like "1488.0 hours purchased, 1488.0 hours used." For an RI that was not entirely used, you will see something like "1488.0 hours purchased, 715.00000000 hours used."

I hope that makes sense. If anything I said wasn't clear, or if you have any further questions, please let us know.

The Weather Channel Feb 23, 2015 10:33 AM -0500

So this stumps me a little bit... let me explain why.

I wrote a very simple python script which takes each row of the csv that includes "LineItem" in the "RecordType" column. This includes both the "hour-by-hour" charges per account, and the aforementioned "utilization" rows that we discussed earlier in this ticket. So here's the rub, the total comes out to be the same total that is printed at the end of the bill.

You can find the py script here: https://gist.github.com/ckelner/962b9e52db11fc73cf68

Here's a run on our january bill: $ python aws_csv_reader.py 013328811177-aws-billing-detailed-line-items-with-resources-and-tags-2015-01.csv Rows processed: 17229196 LineItem UnBlendedCost total: XXXXXX.XX

And from the AWS billing console it reports: $XXX,XXX.XX

So I dug out the "utilization" lines from the csv and created a smaller csv so these could be examined. That csv can be seen here: -- as you can see the script I wrote will pick up those line items just fine, and pull the unblended cost for those lines.

This smaller csv was created using: $ head -n 1 013328811177-aws-billing-detailed-line-items-with-resources-and-tags-2015-01.csv > jan_utilization.csv $ grep -Eiw 'HeavyUsage.*purchased' ~/kelner_temp/013328811177-aws-billing-detailed-line-items-with-resources-and-tags-2015-01.csv >> jan_utilization.csv

Without those values, the total is not correct. I wrote a second script (seen here: https://gist.github.com/ckelner/878d653928d476d22fb9) that excludes those line items (as seen in the csv provided from dropbox) from the total. The total then comes out to:

$ python aws_csv_reader_v2.py 013328811177-aws-billing-detailed-line-items-with-resources-and-tags-2015-01.csv Rows processed: 17229196 LineItem UnBlendedCost total: XXXXXX.XXX

So this value is $14,000 off from the correct total.

So can you help me understand why that is? As I understand it, counting those "utilization" lines would effectively have me counting resources twice, but given that the total is the same, I don't see how that is possible?

Amazon Web Services Feb 23, 2015 04:02 PM -0500

Thank you again for reaching out.

We do want to give a direct, clear and concise answer regarding this to make sure we can figure out the issue and come up with an accurate solution.

I do apologize for the length of time on this case, but I will coordinate with the first Concierge Agent to properly research and clarify what can be done. So you are aware, we are escalating to our internal Billing Platform service team to make sure we know exactly why this is occurring.

Any recommendations made by the team will be passed along to you as soon as we receive them.

We strongly appreciate your patience through this, and please let us know if there is anything further we can assist with at this time.

Will reach out again soon!

ckelner avatar Feb 23 '15 21:02 ckelner

I opened another ticket w/ AWS about their cost explorer where I saw a similar inconsistency:

The Weather Channel Feb 23, 2015 02:30 PM -0500

You may also want to reference case #1341404481

It appears that when viewing the total for reserved instances in the AWS cost explorer, the total is correct as seen in attachment "reserved_jan_2015_total.png" and the total is mismatched (under cost) when viewing by availability zone as seen in attachment "reserved_jan_2015_az.png" for each account under our consolidated bill.

This correlates to a mismatch we've also seen in our ECSV/DBR (as noted in the other case #1341404481).

Can you help us understand this?

"reserved_jan_2015_total.png" reserved_jan_2015_total "reserved_jan_2015_az.png" reserved_jan_2015_az

Amazon Web Services Feb 23, 2015 04:02 PM -0500

Thank you again for reaching out.

We do want to give a direct, clear and concise answer regarding this to make sure we can figure out the issue and come up with an accurate solution.

I do apologize for the length of time on this case, but I will coordinate with the first Concierge Agent to properly research and clarify what can be done. So you are aware, we are escalating to our internal Billing Platform service team to make sure we know exactly why this is occurring.

Any recommendations made by the team will be passed along to you as soon as we receive them.

We strongly appreciate your patience through this, and please let us know if there is anything further we can assist with at this time.

Will reach out again soon!

ckelner avatar Feb 23 '15 21:02 ckelner

I want to be sure that I make mention that I am NOT using the reservation poller. When I enabled it and seemingly got the configuration correct, my bill was off by a much larger factor (as mentioned earlier in this issue) -- it was off in the 6 digits (~20%) versus the low 5 digits (~3%) without it. I opened https://github.com/Netflix/ice/issues/147 to work on it, but I was unable to make any progress. Seeing as 3% was easier to reconcile than 20% I decided to continue working without the RI poller.

ckelner avatar Feb 23 '15 21:02 ckelner

By making these changes here: https://github.com/Netflix/ice/compare/master...TheWeatherCompany:fix-npe-on-bootstrap Our bill is exact for months without credits from AWS.

Spot checking individual accounts, all cost and utilization adds up except for regions that fall into the logic where "no zone" line items get shoved into the "a" zone of a given region. Other than that everything looks perfect as far as we can tell.

ckelner avatar Feb 23 '15 21:02 ckelner