explorer icon indicating copy to clipboard operation
explorer copied to clipboard

Fix/add prometheus metrics/i329

Open beguene opened this issue 3 years ago • 6 comments

This fixes #329 and is building on top of https://github.com/hirosystems/explorer/pull/491 and https://github.com/hirosystems/stacks-blockchain-api/pull/412

The goal is to have an endpoint open for prometheus to pull metrics from the explorer server

We now have a custom next.js server that includes a prometheus middleware and launches a specific endpoint at the root on port 9153 so localhost:9153 in dev mode and https://explorer.stacks.co:9153 on live. However @CharlieC3 we need to open this port on vercel for staging and live.

When running the custom server, next automatically typechecks and since we had 18 errors the build failed. I fixed all those type errors in this PR.

Example of metrics captured Before aggregation by bucket

http_request_duration_seconds_bucket{le="0.05",method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.1",method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.3",method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.5",method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.8",method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="1",method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="1.5",method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="2",method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="3",method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="10",method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="+Inf",method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_sum{method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 0.002088583
http_request_duration_seconds_count{method="head",path="/txid/SP3K8BC0PPEVCV7NZ6QSRWPQ2JE9E5B6N3PA0KBR9.auto-alex?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.05",method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.1",method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.3",method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.5",method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.8",method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_bucket{le="1",method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_bucket{le="1.5",method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_bucket{le="2",method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_bucket{le="3",method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_bucket{le="10",method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_bucket{le="+Inf",method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_sum{method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 0.003887333
http_request_duration_seconds_count{method="head",path="/address/SP1Q2CJBBNBYPG0XJB6996FEA0S4GMHQGP9K11NCC",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.05",method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.1",method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.3",method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.5",method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.8",method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="1",method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="1.5",method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="2",method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="3",method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="10",method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_bucket{le="+Inf",method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 1
http_request_duration_seconds_sum{method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838659?chain=mainnet",status_code="200"} 0.002520042
http_request_duration_seconds_count{method="head",path="/block/0xa033d3985acf8b1b2e2969d5a2d6698745c9cd1ab373afea706d3eccf1838

After

http_request_duration_seconds_bucket{le="0.05",method="head",path="/",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.1",method="head",path="/",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.3",method="head",path="/",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.5",method="head",path="/",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.8",method="head",path="/",status_code="200"} 1
http_request_duration_seconds_bucket{le="1",method="head",path="/",status_code="200"} 1
http_request_duration_seconds_bucket{le="1.5",method="head",path="/",status_code="200"} 1
http_request_duration_seconds_bucket{le="2",method="head",path="/",status_code="200"} 1
http_request_duration_seconds_bucket{le="3",method="head",path="/",status_code="200"} 1
http_request_duration_seconds_bucket{le="10",method="head",path="/",status_code="200"} 1
http_request_duration_seconds_bucket{le="+Inf",method="head",path="/",status_code="200"} 1
http_request_duration_seconds_sum{method="head",path="/",status_code="200"} 0.012418583
http_request_duration_seconds_count{method="head",path="/",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.05",method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.1",method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.3",method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.5",method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.8",method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="1",method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="1.5",method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="2",method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="3",method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="10",method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="+Inf",method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_sum{method="head",path="/txid/*",status_code="200"} 0.005761583
http_request_duration_seconds_count{method="head",path="/txid/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.05",method="head",path="/address/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.1",method="head",path="/address/*",status_code="200"} 2
http_request_duration_seconds_bucket{le="0.3",method="head",path="/address/*",status_code="200"} 2
http_request_duration_seconds_bucket{le="0.5",method="head",path="/address/*",status_code="200"} 2
http_request_duration_seconds_bucket{le="0.8",method="head",path="/address/*",status_code="200"} 2
http_request_duration_seconds_bucket{le="1",method="head",path="/address/*",status_code="200"} 2
http_request_duration_seconds_bucket{le="1.5",method="head",path="/address/*",status_code="200"} 2
http_request_duration_seconds_bucket{le="2",method="head",path="/address/*",status_code="200"} 2
http_request_duration_seconds_bucket{le="3",method="head",path="/address/*",status_code="200"} 2
http_request_duration_seconds_bucket{le="10",method="head",path="/address/*",status_code="200"} 2
http_request_duration_seconds_bucket{le="+Inf",method="head",path="/address/*",status_code="200"} 2
http_request_duration_seconds_sum{method="head",path="/address/*",status_code="200"} 0.081425833
http_request_duration_seconds_count{method="head",path="/address/*",status_code="200"} 2
http_request_duration_seconds_bucket{le="0.05",method="head",path="/block/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.1",method="head",path="/block/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.3",method="head",path="/block/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.5",method="head",path="/block/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="0.8",method="head",path="/block/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="1",method="head",path="/block/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="1.5",method="head",path="/block/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="2",method="head",path="/block/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="3",method="head",path="/block/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="10",method="head",path="/block/*",status_code="200"} 1
http_request_duration_seconds_bucket{le="+Inf",method="head",path="/block/*",status_code="200"} 1
http_request_duration_seconds_sum{method="head",path="/block/*",status_code="200"} 0.006419041
http_request_duration_seconds_count{method="head",path="/block/*",status_code="200"} 1

@fbwoolf @He1DAr @kyranjamie @zone117x

beguene avatar May 27 '22 09:05 beguene

@CharlieC3 Is there a way to open the port 9153 in the host so we can test it on staging ? (it should also be open in production)

beguene avatar Jun 06 '22 12:06 beguene

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
hiro-explorer ✅ Ready (Inspect) Visit Preview Jun 16, 2022 at 11:46AM (UTC)

vercel[bot] avatar Jun 16 '22 11:06 vercel[bot]

Maybe we should look into using something that doesn't require a custom next.js server? Using a custom server is kinda discouraged and can add extra complexity. https://nextjs.org/docs/advanced-features/custom-server image

He1DAr avatar Jun 21 '22 14:06 He1DAr

@He1DAr I would also prefer not using a custom server. Let's see which infra we are targeting and come back to this later. We might not even need prometheus.

beguene avatar Jun 22 '22 12:06 beguene

@CharlieC3 it doesn't seem like we'll be able to introduce Prometheus to the explorer. Is this ok or would you like us to meet and brainstorm options to get it done?

andresgalante avatar Aug 02 '22 16:08 andresgalante

It doesn't seem like we'll be able to introduce Prometheus to the explorer.

@andresgalante Is this because of the concerns around it being implemented as a custom server? If so, are we sure the optimizations lost when using a prometheus custom server would affect the whole service's performance? Additionally, is it a matter of a couple milliseconds or something more significant?

Is this ok or would you like us to meet and brainstorm options to get it done?

Having app metrics for the Explorer would be most valuable to the UX team, so I feel your team should decide if it's ok or not. DevOps doesn't necessarily need to have them implemented for the Explorer, but we highly suggest it as it may help UX discover bugs and measure changes over time like it has for Splat with the API. And as the Explorer gets more complex, it may be crucial to debug issues and restore availability for any UX team members that are on call.

CharlieC3 avatar Aug 02 '22 17:08 CharlieC3

I am closing this PR for now, if when decide to move on with it re can reopen it.

andresgalante avatar Jan 23 '23 14:01 andresgalante