nox icon indicating copy to clipboard operation
nox copied to clipboard

Add builtins for Metrics

Open dzmitry-lahoda opened this issue 3 years ago • 24 comments

so i can track service metrics like it happens in k8s when each metrics is tagged by docker pod etc

dzmitry-lahoda avatar Oct 16 '21 11:10 dzmitry-lahoda

need both store and query (start from subset of queries)

dzmitry-lahoda avatar Oct 16 '21 11:10 dzmitry-lahoda

if ceramic can do same or similar - than could we have ceramic build in?

dzmitry-lahoda avatar Oct 16 '21 11:10 dzmitry-lahoda

allow to express enums in aqua and force reduce cardinality onto parameters, same as dht - peer id and service id any, other parametes with cardinality of 666 max.

dzmitry-lahoda avatar Oct 16 '21 11:10 dzmitry-lahoda

nope, ceramic does not suits. need realtime compressed cheap histograms built in

dzmitry-lahoda avatar Oct 16 '21 11:10 dzmitry-lahoda

than info can be used for routing, circuit breakers, pricing.

dzmitry-lahoda avatar Oct 16 '21 11:10 dzmitry-lahoda

not sure about security so. may be make read part of each function call avail in aqua?

dzmitry-lahoda avatar Oct 16 '21 11:10 dzmitry-lahoda

image

dzmitry-lahoda avatar Oct 18 '21 20:10 dzmitry-lahoda

What kind of metrics do you foresee in such a built-in? Like, a number of currently open connections? What else?

How do you see this service being used? Maybe you have some use-cases in mind?

need both store and query

Who will store what? Applications will store their metrics?

need realtime compressed cheap histograms built in

Can you elaborate, what's your use-case for histograms?

not sure about security so. may be make read part of each function call avail in aqua?

How would that make it more secure?

folex avatar Oct 19 '21 08:10 folex

What kind of metrics do you foresee in such a built-in? Like, a number of currently open connections? What else? Custom metrics. Any PeerId can post any metrics into any key. I want to store metrics which are custom, like failed deliveries.

How do you see this service being used? Maybe you have some use-cases in mind?

Stop trying to notify user (weighet average randomess) if it too offline (metric). Stop push into notification service if it not responding. Avoid routing to bad nodes. Etc. Stop pulling from history services which are bad. Each node desides from his data about other nodes. From aqua.

Applications will store their metrics?

Yes. Via Aqua script. Same as for DHT.

Can you elaborate, what's your use-case for histograms?

Like QoS. "If precintile of delivery with DHT K factor of 10 is 95, while with K+2 is 99 - use 99". Or if P1 has 95 under 100ms, and P2 has 99 under 100ms - route via P2.

dzmitry-lahoda avatar Oct 19 '21 08:10 dzmitry-lahoda

https://en.wikipedia.org/wiki/Percentile

https://queue.acm.org/detail.cfm?id=2903468

dzmitry-lahoda avatar Oct 19 '21 08:10 dzmitry-lahoda

How would that make it more secure?

DHT like security seems suits. But anyway - any kind of stats are needed. I doubt app devs should build these. And Prometheus seems rather nice for starter. Metrics != Analytics. Metrics are more realtime = Fine to restart noes and lost info.

dzmitry-lahoda avatar Oct 19 '21 08:10 dzmitry-lahoda

DDoS security is limit time range and precision and cardinality.

dzmitry-lahoda avatar Oct 19 '21 08:10 dzmitry-lahoda

I want to store metrics which are custom, like failed deliveries.

Makes sense, so you basically want integration with a time-series database like Prometheus or TimescaleDB. To me, that sounds like a service one could deploy to its own nodes rather than a network-wide built-in.

folex avatar Oct 19 '21 08:10 folex

Histograms,

https://github.com/fluencelabs/fluence/issues/1183 - co success depened on TTL

https://github.com/fluencelabs/aqua/issues/329 - DHT search dependant on success to find

dzmitry-lahoda avatar Oct 19 '21 08:10 dzmitry-lahoda

Stop trying to notify user (weighet average randomess) if it too offline (metric).

There's no way to have a metric that would tell if a user is online or offline. It can only be done with some keep-alive mechanic.

folex avatar Oct 19 '21 08:10 folex

Makes sense, so you basically want integration with a time-series database like Prometheus or TimescaleDB. To me, that sounds like a service one could deploy to its own nodes rather than a network-wide built-in.

Fluence does not makes sense without metrics for me. DHT + Metics = Basic Build Ins. So I do not consider such deployment.

dzmitry-lahoda avatar Oct 19 '21 08:10 dzmitry-lahoda

There's no way to have a metric that would tell if a user is online or offline. It can only be done with some keep-alive mechanic.

There is:)

dzmitry-lahoda avatar Oct 19 '21 08:10 dzmitry-lahoda

There's no way to have a metric that would tell if a user is online or offline. It can only be done with some keep-alive mechanic.

There is:)

What is it?

folex avatar Oct 19 '21 08:10 folex

Overall I'd say this is a very important topic. We have plans for R&D on telemetry, metrics, and routing based on dynamic weights in TrustGraph.

Currently, it's still early days for TrustGraph, so understanding you vision & ideas helps a lot :)

So let's keep this discussion going!

folex avatar Oct 19 '21 08:10 folex

What is it?

So

  1. user reports its location when joins
  2. notification tries to post into user via its location (relay)
  3. notification fails via varying routes and particle ttls
  4. if statistic tells that notification about messages reach some threshold, user is offline.
  5. notification switches to not so often attemps to notify.

alternative:

  1. user ping services or puts timestamp into hist location from time to time
  2. resonable, but what if he pings, but cannot receive messages?
  3. or he pings split brain node?
  4. so offline is statistic on failed attemps (offline is not 0 or 1, but time weigthed average of notification attemps)

dzmitry-lahoda avatar Oct 19 '21 08:10 dzmitry-lahoda

Overall I'd say this is a very important topic. We have plans for R&D on telemetry, metrics, and routing based on dynamic weights in TrustGraph.

Currently, it's still early days for TrustGraph, so understanding you vision & ideas helps a lot :)

That is why I should not deploy anything on some nodes. I think it should be build in. So will just put Aqua comments where stat can be used and replaced with stat later

dzmitry-lahoda avatar Oct 19 '21 09:10 dzmitry-lahoda

That is why I should not deploy anything on some nodes.

Fluence Node is meant to be customized & extended with sidecar services and adapters. It's meant that different hosters will provide different software suites.

Fluence protocol is to provide some software and solutions as fundamental building blocks, but at the end of the day, it's the hosters who will enrich the Fluence network with different solutions.

folex avatar Oct 19 '21 10:10 folex

there are some build in and some extrenals. it is hard ro make extrenals(and there is no protocol) and it will be waste to do external we (my thinking) must be built in. i do bot see fluence viable without metric build in.

dzmitry-lahoda avatar Oct 19 '21 16:10 dzmitry-lahoda

not only build in, but also aqua integrated use these during particle flow.

dzmitry-lahoda avatar Oct 19 '21 16:10 dzmitry-lahoda