go-guerrilla
go-guerrilla copied to clipboard
Analytics dashboard
@jordanschalm has started working on Analytics package https://github.com/flashmob/go-guerrilla/tree/dashboard We have agreed that we will drop the TLS & password protection requirement, since we do not have Lets Encrypt automation yet, and self signed certs can be MITM'd. So bind to localhost for now. The way it may be implemented is via the existing logging facility. A goroutine will log take samples periodically and write them to the log. Another goroutine can tail the log and ready the data for presentation. (Note that in the future, the log could be closed and reopened upon a SIGHUP signal) Presentation data will be rendered using vanilla Js and a js charting library.
This is roughly the layout and data I'm currently planning on displaying on the dashboard page. @flashmob do you have any suggestions for change/improvement?
Thanks for the update. That looks good. Edit: At this point, the most minimal and basic features are best, and this would be great to start with. Question: is it possible to query your module to ask ad-hoc questions, eg. Get the top connected clients in last hour, etc. Perhaps it may be useful in other areas, such as temporarily banning a client that is connecting too much. The memory usage stats could be used to adjust max email size, etc. (Just asking if possible, not a requirement.)
In the analytics store we have access to basically the same info as in the mockup. So you could ask for the top client by helo/domain/IP in the past 24 hours pretty easily. The data is grouped into 6-hour intervals, so you can also get the top client (by those same metrics) from the past 6 hours. This could be changed to group the data into 1-hour intervals, but it would use quite a bit more memory and each tick (where we "measure" and send a new data frame to the frontend) would take longer to aggregate.
In general, there are a couple of constants we can tweak that change how much memory the analytics uses and how much/how accurate data on the front-end is.
maxWindow
- maximum time shown on the dashboard, currently set to 24hours
tickInterval
- how frequently we measure and send a new dataframe to the front-end. Relates to memory usage because each tick is stored for the maxWindow
interval and computation time because each tick requires a potentially costly aggregation of the top N clients rankings. Currently 5seconds so maximum of ~15000 points.
rankingRefreshRate
- how frequently we roll over the top N clients rankings. More frequent refresh makes the data more granular (i.e. top client in last hour rather than last 6 hours), but uses more memory and makes aggregation step on each tick run longer
Here's what it looks like currently:
Thanks for the update. That looks great & looking forward to testing it. Also, good idea to make these as config settings. 6 hours seems like a good trade-off as the default for the aggregations.
Also, maybe the HELO could be limited to no longer than 16 characters? Another concern, would it be possible for someone to spam HELO by repeatedly making new connections in order to exhaust the memory? If so, a check would need to be made to stop collecting data after a certain limit...
Yeah good call, I'll add those to the Dashboard Config.
Someone could spam HELO to use up memory. Limiting the size of HELOs will help mitigate that for sure. Maybe we can stop collecting data if we notice too many unique HELOs compared to the number of connections for a given period. Do you have any idea what that ratio would look for normal workloads? I assume the number of unique HELOs should be pretty small
Yes, stopping collection after a ratio threshold would be a good idea. Not sure about the normal workloads, perhaps these limits could be made into config options & we can adjust as needed.