nox
nox copied to clipboard
Add builtins for Metrics
so i can track service metrics like it happens in k8s when each metrics is tagged by docker pod etc
need both store and query (start from subset of queries)
if ceramic can do same or similar - than could we have ceramic build in?
allow to express enums in aqua and force reduce cardinality onto parameters, same as dht - peer id and service id any, other parametes with cardinality of 666 max.
nope, ceramic does not suits. need realtime compressed cheap histograms built in
than info can be used for routing, circuit breakers, pricing.
not sure about security so. may be make read part of each function call avail in aqua?
What kind of metrics do you foresee in such a built-in? Like, a number of currently open connections? What else?
How do you see this service being used? Maybe you have some use-cases in mind?
need both store and query
Who will store what? Applications will store their metrics?
need realtime compressed cheap histograms built in
Can you elaborate, what's your use-case for histograms?
not sure about security so. may be make read part of each function call avail in aqua?
How would that make it more secure?
What kind of metrics do you foresee in such a built-in? Like, a number of currently open connections? What else? Custom metrics. Any PeerId can post any metrics into any key. I want to store metrics which are custom, like failed deliveries.
How do you see this service being used? Maybe you have some use-cases in mind?
Stop trying to notify user (weighet average randomess) if it too offline (metric). Stop push into notification service if it not responding. Avoid routing to bad nodes. Etc. Stop pulling from history services which are bad. Each node desides from his data about other nodes. From aqua.
Applications will store their metrics?
Yes. Via Aqua script. Same as for DHT.
Can you elaborate, what's your use-case for histograms?
Like QoS. "If precintile of delivery with DHT K factor of 10 is 95, while with K+2 is 99 - use 99". Or if P1 has 95 under 100ms, and P2 has 99 under 100ms - route via P2.
https://en.wikipedia.org/wiki/Percentile
https://queue.acm.org/detail.cfm?id=2903468
How would that make it more secure?
DHT like security seems suits. But anyway - any kind of stats are needed. I doubt app devs should build these. And Prometheus seems rather nice for starter. Metrics != Analytics. Metrics are more realtime = Fine to restart noes and lost info.
DDoS security is limit time range and precision and cardinality.
I want to store metrics which are custom, like failed deliveries.
Makes sense, so you basically want integration with a time-series database like Prometheus or TimescaleDB. To me, that sounds like a service one could deploy to its own nodes rather than a network-wide built-in.
Histograms,
https://github.com/fluencelabs/fluence/issues/1183 - co success depened on TTL
https://github.com/fluencelabs/aqua/issues/329 - DHT search dependant on success to find
Stop trying to notify user (weighet average randomess) if it too offline (metric).
There's no way to have a metric that would tell if a user is online or offline. It can only be done with some keep-alive mechanic.
Makes sense, so you basically want integration with a time-series database like Prometheus or TimescaleDB. To me, that sounds like a service one could deploy to its own nodes rather than a network-wide built-in.
Fluence does not makes sense without metrics for me. DHT + Metics = Basic Build Ins. So I do not consider such deployment.
There's no way to have a metric that would tell if a user is online or offline. It can only be done with some keep-alive mechanic.
There is:)
There's no way to have a metric that would tell if a user is online or offline. It can only be done with some keep-alive mechanic.
There is:)
What is it?
Overall I'd say this is a very important topic. We have plans for R&D on telemetry, metrics, and routing based on dynamic weights in TrustGraph.
Currently, it's still early days for TrustGraph, so understanding you vision & ideas helps a lot :)
So let's keep this discussion going!
What is it?
So
- user reports its location when joins
- notification tries to post into user via its location (relay)
- notification fails via varying routes and particle ttls
- if statistic tells that notification about messages reach some threshold, user is offline.
- notification switches to not so often attemps to notify.
alternative:
- user ping services or puts timestamp into hist location from time to time
- resonable, but what if he pings, but cannot receive messages?
- or he pings split brain node?
- so offline is statistic on failed attemps (offline is not 0 or 1, but time weigthed average of notification attemps)
Overall I'd say this is a very important topic. We have plans for R&D on telemetry, metrics, and routing based on dynamic weights in TrustGraph.
Currently, it's still early days for TrustGraph, so understanding you vision & ideas helps a lot :)
That is why I should not deploy anything on some nodes. I think it should be build in. So will just put Aqua comments where stat can be used and replaced with stat later
That is why I should not deploy anything on some nodes.
Fluence Node is meant to be customized & extended with sidecar services and adapters. It's meant that different hosters will provide different software suites.
Fluence protocol is to provide some software and solutions as fundamental building blocks, but at the end of the day, it's the hosters who will enrich the Fluence network with different solutions.
there are some build in and some extrenals. it is hard ro make extrenals(and there is no protocol) and it will be waste to do external we (my thinking) must be built in. i do bot see fluence viable without metric build in.
not only build in, but also aqua integrated use these during particle flow.