ccloud-chargeback-helper icon indicating copy to clipboard operation
ccloud-chargeback-helper copied to clipboard

CCLOUD_LOOKBACK_DAYS=200

Open davinder26 opened this issue 10 months ago • 13 comments
trafficstars

@waliaabhishek Hello

I specify CCLOUD_LOOKBACK_DAYS=200 I am getting data july-August-Sep-Oct ,not getting after these months.

I want data from current month to last 6 months.what CCLOUD_LOOKBACK_DAYS days you suggest.

davinder26 avatar Jan 24 '25 14:01 davinder26

@waliaabhishek Can we please suggest

the prometheus_feeder->

aiting for readiness probe to be True Readiness probe is False Waiting for readiness probe to be True Readiness probe is False Waiting for readiness probe to be True Readiness probe is False Waiting for readiness probe to be True

davinder26 avatar Jan 27 '25 05:01 davinder26

I dont know of any changes to the Billing API lately. The above logs are telling me that the ccloud_chargeback_handler exporter is not ready for some reason. Can you check the logs for ccloud_chargeback_handler pod and share them ? That will help me debug this better.

waliaabhishek avatar Jan 27 '25 20:01 waliaabhishek

@waliaabhishek I updated logs for both ccloud_chargeback_handler ,or prometheus_feeder .Thanks!

davinder26 avatar Jan 28 '25 12:01 davinder26

@waliaabhishek for prometheus_feeder

index.html 100% || 7971 0:00:00 ETA 'index.html' saved BLOCK ULID MIN TIME MAX TIME DURATION NUM SAMPLES NUM CHUNKS NUM SERIES SIZE 01JJPDZHG364GWD1Q6A12RHVWZ 1737846000000 1737846000001 1ms 20 20 20 4713 Sleeping for 600 seconds Readiness probe is True Scraping Interval set to 600 Connecting to ccloud_chargeback_handler:8000 (172.18.0.3:8000) saving to 'index.html' index.html 100% || 7971 0:00:00 ETA 'index.html' saved BLOCK ULID MIN TIME MAX TIME DURATION NUM SAMPLES NUM CHUNKS NUM SERIES SIZE 01JJPEHVGK5EYY5X31AZC75F0X 1737846000000 1737846000001 1ms 20 20 20 4713 Sleeping for 600 seconds Readiness probe is True Scraping Interval set to 600 Connecting to ccloud_chargeback_handler:8000 (172.18.0.3:8000) saving to 'index.html' index.html 100% |********************************| 7971 0:00:00 ETA 'index.html' saved BLOCK ULID MIN TIME MAX TIME DURATION NUM SAMPLES NUM CHUNKS NUM SERIES SIZE 01JJPF45H7Z7R6355DQ1DZPV6K 1737846000000 1737846000001 1ms 20 20 20 4713


for ccloud_chargeback_handler

025-01-28 12:45:34,075 werkzeug INFO 172.18.0.5 - - [28/Jan/2025 12:45:34] "GET /is_ready HTTP/1.1" 200 - 2025-01-28 12:45:34,079 werkzeug INFO 172.18.0.5 - - [28/Jan/2025 12:45:34] "GET /current_timestamp HTTP/1.1" 200 - 2025-01-28 12:45:34,083 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 12:45:34,083 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 12:45:34,083 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 12:45:34,084 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 12:45:34,084 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 12:45:34,084 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 12:45:34,084 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 12:45:34,084 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 12:45:34,084 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 12:45:34,084 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 12:45:34,085 ccloud.org DEBUG Next Fetch Date is same as the current fetch date. Clearing out the stats to prevent republishing of the same data. root@ip-172-21-25-252:/home/kafkadevops-user#

davinder26 avatar Jan 28 '25 12:01 davinder26

@waliaabhishek I changed CCLOUD_LOOKBACK_DAYS=100.But still grafana showing data from July-oct,I am not able see data for nov,dec and jan.

CCLOUD_LOOKBACK_DAYS change does not effect on data.

davinder26 avatar Jan 28 '25 12:01 davinder26

Can you go to the grafana dashboard and share screenshot for "Chargeback Data available until" graph over the last 6 months timeline. The logs are telling me that everything is normal, all the data was published and it has caught up to the current date.

waliaabhishek avatar Jan 28 '25 19:01 waliaabhishek

@waliaabhishek specify CCLOUD_LOOKBACK_DAYS=100.

2025-01-28 20:15:37,876 ccloud.org DEBUG Next Fetch Date is same as the current fetch date. Clearing out the stats to prevent republishing of the same data. 2025-01-28 20:25:37,947 werkzeug INFO 172.18.0.5 - - [28/Jan/2025 20:25:37] "GET /is_ready HTTP/1.1" 200 - 2025-01-28 20:25:37,951 werkzeug INFO 172.18.0.5 - - [28/Jan/2025 20:25:37] "GET /current_timestamp HTTP/1.1" 200 - 2025-01-28 20:25:37,956 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 20:25:37,956 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 20:25:37,956 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 20:25:37,956 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 20:25:37,956 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 20:25:37,956 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 20:25:37,956 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 20:25:37,957 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 20:25:37,957 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 20:25:37,957 prometheus_processing.custom_collector DEBUG Notifying observers 2025-01-28 20:25:37,957 ccloud.org DEBUG Next Fetch Date is same as the current fetch date. Clearing out the stats to prevent republishing of the same data.

davinder26 avatar Jan 28 '25 20:01 davinder26

The code seems to have read all the data and ingested it. This graph is basically the one that allows me to check if the data is getting ingested or not. It seems to have caught up the current time as well. What is the problem right now, is there no data at all in the grafana dashboards? Can you check if your API key has the correct access?

waliaabhishek avatar Jan 28 '25 21:01 waliaabhishek

Lookback is not going to help you right now. It is only useful for cases when you are starting fresh and it needs to know how far back does it need to go to start the calculation. Once all the data is in there lookback will have no effect.

waliaabhishek avatar Jan 28 '25 21:01 waliaabhishek

@waliaabhishek I checked log files I getting error -

Connecting to ccloud_chargeback_handler:8000 (172.18.0.4:8000) wget: server returned error: HTTP/1.0 500 Internal Server Error tail: can't open 'index.html': No such file or directory tail: no files Sleeping for 1 seconds Readiness probe is False Readiness probe is False Waiting for readiness probe to be True Readiness probe is False Waiting for readiness probe to be True Readiness probe is False

After it stop reporting metrics

davinder26 avatar Jan 31 '25 14:01 davinder26

The HTTP/500 is suggesting to me that the ccloud_chargeback_handler was marked as not ready or it failed. Any errors in that pod ? Also, I will not be available for the next few days. Will resume debugging with you when I resume work.

waliaabhishek avatar Jan 31 '25 23:01 waliaabhishek

@waliaabhishek I tried with 60 days ,I m getting data for Jan 2025. But when start with 90 days below error Traceback (most recent call last): File "/app/main.py", line 23, in execute_workflow(arg_flags) File "/app/helpers.py", line 35, in add_entry_exit_logs ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/workflow_runner.py", line 134, in execute_workflow ccloud_orgs = CCloudOrgList( ^^^^^^^^^^^^^^ File "", line 4, in init File "/app/ccloud/org.py", line 225, in post_init temp = CCloudOrg( ^^^^^^^^^^ File "", line 4, in init File "/app/ccloud/org.py", line 112, in post_init self.metrics_handler = PrometheusMetricsDataHandler( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 9, in init File "/app/data_processing/data_handlers/prom_metrics_api_handler.py", line 66, in post_init self.read_all(start_date=self.start_date, end_date=end_date, query_type=item) File "/app/helpers.py", line 35, in add_entry_exit_logs ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/data_processing/data_handlers/prom_metrics_api_handler.py", line 96, in read_all temp_data = [ ^ File "/app/data_processing/data_handlers/prom_metrics_api_handler.py", line 101, in METRICS_API_COLUMNS.principal_id: item["metric"]["principal_id"], ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^

For 200 days its always stop working 27 oct 20204.I am getting error while getting for NOV,DEC 2024.

davinder26 avatar Feb 03 '25 13:02 davinder26

As confirmed during our debug session, it seems like the data is missing in the metrics API prometheus store that you have. I will try to improve the handling of that dataset , but that basically means missing out on chargeback calculations if the metrics data is corrupt in your metrics datastore.

In the current state, the code does not skip over the corrupt data set issue and fails to proceed further. The easiest path forward for now will be to reduce CCLOUD_LOOKBACK_DAYS to skip that date where the data is missing to ensure that the code is able to proceed forward (as confirmed by our testing).

waliaabhishek avatar Feb 20 '25 20:02 waliaabhishek