Austin McKinley comments

Results 14 comments of


                                            Austin McKinley

Distributors using spectacular amount of memory

Example pprof (original SVG available on request): ![Screen Shot 2020-09-15 at 4 45 13 PM](https://user-images.githubusercontent.com/54160/93275863-f1021780-f772-11ea-8d86-d22788dc7620.png)

Distributors using spectacular amount of memory

> How many cpus do your kubernetes nodes have? did you configure cpu limits on the distributors? c5.24xlarge EC2 instances, which have 96 vCPUs. We don't have CPU limits configured...

Distributors using spectacular amount of memory

Also, last night we tried passing `GOGC=30` to these containers, and that managed to significantly improve the growth of the heap size. It now looks like the heap maximum size...

Distributors using spectacular amount of memory

> Can we get the heap profile data please? (not a screen dump or svg) Sure, here you go: https://github.com/amckinley/cortex-heap

Distributors using spectacular amount of memory

@bboreham anything else I can provide on our side? Happy to provide more heap profiles or try any tuning suggestions you have.

Distributors using spectacular amount of memory

@bboreham sorry for the delay; I'm back to working on this now. [Here's](https://github.com/amckinley/cortex-heap/blob/main/heap2.out) another heap dump (~50GB, this time of the particular distributor that's at max for our cluster), and...

Distributors using spectacular amount of memory

@bboreham We have 8 clusters, each of which has 2 replicas. (Actually, we have one huge cluster, but in order to make `grafana-agent` work, we had to create 8 distinct...

Negative power usage and items that do not sum to 0

Just hit this today, [here's](https://gist.github.com/amckinley/2a7ffd867ebdf8fd7502041e33ed5b96) my example if it helps. ![image](https://github.com/ClaudeMetz/FactoryPlanner/assets/54160/228fa79f-36bb-4752-a698-0c22f5dade5a) All mods/basegame up to date.

Bump gRPC max tx/rx to 100MB for ingester and distributor

Hi @pracucci, what is the purpose of these limits? It doesn't look like Cortex is capable of "chunking" any of the data it returns, so hitting these limits just causes...

Seeing metric gaps after upgrading to 2.19.2

> I am impressed that that prometheus works at all. Thats quite a scale! It's amazing what you can do with an `i3en.12xlarge` instance on EC2 :P