ponzu icon indicating copy to clipboard operation
ponzu copied to clipboard

General re-write of dashboard analytics

Open nilslice opened this issue 6 years ago • 21 comments

After numerous Ponzu projects in production over months of usage, I've noticed inconsistencies in the data / API usage analytics presented in the admin dashboard.

I think this deserves a rewrite. If anyone has interest in taking a look, please let me know and I'll be glad to help.

nilslice avatar Sep 07 '17 18:09 nilslice

I'll have a look Steve. Bit rusty after the summer though :)

olliephillips avatar Sep 07 '17 19:09 olliephillips

@olliephillips - great! Just the end of summer project to get back in action :)

nilslice avatar Sep 07 '17 20:09 nilslice

Might get chance to look at this next week. What specifically are the inconsistencies? And re "rewrite" are you looking for issue fix, or enhancement of the analytics in some way - or both?

olliephillips avatar Sep 09 '17 12:09 olliephillips

Awesome - let me put together a brief summary of the things I've noticed. I've got family visiting in town for a few more hours but will add notes here after they've left.

nilslice avatar Sep 10 '17 14:09 nilslice

@olliephillips -

download-dashboard

The image above illustrates the main issue - a sawtooth-like pattern is consistent over time. Unfortunately I haven't had time to revisit this portion of the codebase with much attention.

I have a fairly large analytics.db which demonstrates this behavior which I'll send you via ~~twitter~~ Slack DM.. since there could be some semi-sensitive data in it.. URLs and IP addresses of visitors (that's all, but not my place to expose these logs to everyone)

In my experience debugging / inspecting BoltDBs, the Thunder tool is super useful: https://github.com/muesli/thunder, fire it up and look at the db file as if it were a filesystem.. ls to list buckets / keys cd to enter a bucket, cd .. to go up a level get $key to see the value for a key etc..

There are two top-level buckets in the analytics.db default database, __requests (Requests) and __metrics (Metrics). Requests is flushed after some time (you'll need to look at the code to determine duration), and Metrics is kept around forever (?) as a simple count of requests per day. I believe Metrics has two keys inside each bucket (named dd/mm, or 09/14), one for total requests and one for unique requests (based on the IP address).

An issue I have faced in the past is that the Thunder tool cannot cd into one of these Metrics buckets, which may be due to the way they are named? To get around this, I've also used BoltGUI https://github.com/nikitabokovoy/BoltGUI as a inspection tool. However, on larger db files, it freezes up since the data is loaded into a browser..

I think you could get pretty far in debugging the main issue just by going through the code though. It's a bit of a mess, and aside from the timing portion where the db is pruned, its fairly sequential.

Thanks for taking a look and if you get into it, let me know if you have any questions!

nilslice avatar Sep 14 '17 18:09 nilslice

Thanks @nilslice I'm not on Twitter anymore (it was too much of a distraction for me) nor Slack. But If I need the analytics.db file, I will shout. Looks interesting!

olliephillips avatar Sep 14 '17 18:09 olliephillips

@olliephillips - totally know the feeling... you're a stronger person than I to drop them cold-turkey!!

Sounds good -- just let me know here if there's anything holding you up. :+1:

nilslice avatar Sep 14 '17 19:09 nilslice

I've got an instance set up and will just hit it and reconcile the data for the next 7 days. Will come back to you with an update then. Hopefully it'll be dead simple to find :)

olliephillips avatar Sep 19 '17 13:09 olliephillips

That's perfect -- thank you so much for taking a look at this.

nilslice avatar Sep 19 '17 20:09 nilslice

I'm not seeing this. Today is the day I'd have expected to see the issue (based on your 6 then 1 sawtooth pattern above) - I don't.

So far the metrics reconcile perfectly with the requests. The only slight issue I'm seeing is classification of unique clients - I'm running on localhost - but this is to be expected since socket is part of the request ip address.

I'll continue for another week.

For info it looks like the__metrics bucket is appended forever and __requests is flushed daily. It only ever contains the current day stats as far as I can see. Then the data is summed/consolidated in __metrics and flushed from __requests

I'm not hitting the instance with any volume, so maybe this issue could be a factor of volume of requests?

I suppose there's no chance that in the sawtooth data above that could actually reflect the site daily usage pattern? I infer not as most sites analytics I see dip at weekend - those dips seem to be Tuesdays?

olliephillips avatar Sep 25 '17 08:09 olliephillips

I'm not seeing this. Today is the day I'd have expected to see the issue (based on your 6 then 1 sawtooth pattern above) - I don't.

So far the metrics reconcile perfectly with the requests. The only slight issue I'm seeing is classification of unique clients - I'm running on localhost - but this is to be expected since socket is part of the request ip address.

I'll continue for another week.

For info it looks like the__metrics bucket is appended forever and__requests is flushed daily. It only ever contains the current day stats as far as I can see. Then the data is summed/consolidated in __metrics and flushed from __requests

This sounds right - the idea was to prune __requests as to not fill up disk space, and keep __metrics around as needed. However, I believe keys will be overwritten annually, since the naming is only DD/MM.. oops. I think the next release will be a 0.10.0 to indicate a fairly large change, so we'd be ok to make a analytics change that could break prior versions - I'll likely take advantage of this opportunity to fix my lack of foresight :grimacing: haha

I'm not hitting the instance with any volume, so maybe this issue could be a factor of volume of requests?

It could be, but that would likely be a more difficult bug to find -- so I hope not. What level of volume are you sending it and could you increase it without much trouble? The analytics to the server I sent ranges from 10k - 1MM requests per day. It should also be noted that these systems that have been running in production are on older versions of Ponzu... and have customization that aren't upstream. I don't think the custom additional code would impact analytics, but stranger things have happened!

I suppose there's no chance that in the sawtooth data above that could actually reflect the site daily usage pattern? I infer not as most sites analytics I see dip at weekend - those dips seem to be Tuesdays?

I checked w/ the marketing team of the site whose data is shown above, and the Google Analytics do not show as significant of peaks/valleys - however, those analytics do not measure the same thing. Requests vs. Site visits. Though they'd likely have some correlation. The Tuesday dip is present in their analytics, but not as significant nor as regular. Bit hard to draw any conclusions unfortunately.

If it is possible to increase the load you send to measure / test, I think that would be helpful. The other thing to add to your test would be some variance day-to-day. One other bug I have noticed in the past (though not in this data set), is that data seemed to repeat from a day prior to the current day.. this is not consistent though..

Thank you again for testing this out.. really appreciate a second set of eyes on it!!

nilslice avatar Sep 25 '17 15:09 nilslice

To date I've just been manually hitting it and reconciling - so request volume is really low. I'll set something up to automate and vary a larger number of requests - maybe Apache Bench or similar

Re DD/MM key names - yes I noticed that. No big deal but storing forever would offer a year on year comparator which might be interesting.

olliephillips avatar Sep 25 '17 16:09 olliephillips

Might be interesting to also understand the GET/POST traffic split. Batch POST traffic supporting new, update and delete, could put Ponzu under more strain in given period than regular GET from users of the website. Wonder if there are any housekeeping/batch jobs running week to week?

olliephillips avatar Sep 25 '17 18:09 olliephillips

Hey @nilslice. I have a bug - but not the one I was looking for. The metrics reconcile with hits perfectly - I can't get any mismatch between requests I send and what appears in the analytics even at volume (15k+ sequential requests).

However, I'd neglected to test over the weekend and I came to it today to test and find it is reporting friday stats in both the saturday and sunday slots for both requests and uniques - I think you hinted at this second bug in a comment above?

I'll attempt to reproduce.

olliephillips avatar Oct 02 '17 08:10 olliephillips

This behaviour (duplicated data) occurs on application restart, not if the application is running, it fills in the blank days with the last data available. Not just in the chart, the data is persisted to the analytics.db. I'll have a look for this next week and come back

Still cannot reproduce your sawtooth in the data so unfortunately I've had to give up on that particular bug. Maybe someone else can give it a shot?

olliephillips avatar Oct 10 '17 08:10 olliephillips

This behaviour (duplicated data) occurs on application restart, not if the application is running, it fills in the blank days with the last data available. Not just in the chart, the data is persisted to the analytics.db. I'll have a look for this next week and come back

Interesting... based on these bugs popping up branched from the initial investigation, it might be worth doing a full re-write. Instead of including this in a 0.9.4 release, we could tag a 0.10.0 to ensure no one anticipates compatibility with old analytics.db databases...

Still cannot reproduce your sawtooth in the data so unfortunately I've had to give up on that particular bug. Maybe someone else can give it a shot?

This also seems to support the above... what do you think? Also, now would be a good time to consider other analytics-based features, what else we may want to capture, or other ways we could make analytics extensible...

Thanks for taking lead on this and investigating.

nilslice avatar Oct 11 '17 18:10 nilslice

Not sure. I couldn't contribute much on a full rewrite at the moment - work just got busy - and the absence of a bug, doesn't support that - in my mind at least (I'm the fix it when broke guy)

So I will fix the bug I've been able to reproduce :)

On that note, what about being able to ponzu upgrade with the "--dev" flag too? Is it me or would the flag make development testing much easier. Or is my workflow wrong?

Edit: Think just me. Please ignore unless there is some utility in that suggestion

olliephillips avatar Oct 12 '17 20:10 olliephillips

Not sure. I couldn't contribute much on a full rewrite at the moment - work just got busy - and the absence of a bug, doesn't support that - in my mind at least (I'm the fix it when broke guy)

I'd have no issue doing the re-write. It's fairly hefty at current state, so I wouldn't expect to get a PR for it. Just wanted to know if you had any thoughts on what else would be good to track / what users might expect to be able to do with the analytics piece & if I am being too conservative with storage or not conservative enough.

So I will fix the bug I've been able to reproduce :)

I won't be able to start a re-write for 2-3 weeks, so take your time! Thanks for the help thus far.

On that note, what about being able to ponzu upgrade with the "--dev" flag too? Is it me or would the flag make development testing much easier. Or is my workflow wrong? Edit: Think just me. Please ignore unless there is some utility in that suggestion

Definitely not just you -- I've also found that when working on a fix and incorporating it into an existing Ponzu project to test is a bit clumsy. I think a --dev flag is a good addition to the upgrade command. Would be happy to take a PR for it, otherwise will definitely put on my to-do list!

nilslice avatar Oct 13 '17 15:10 nilslice

Will have a think - I guess an obvious suggestion is segmentation with a request type filter. Re workflow yes it is a little clunky but it's the nature of the application. For me another boon would be if new and upgrade --dev didnt git clone but copied the source under development. This would avoid creating a messy commit history when experimenting with a feature? Do you consider this a problem?

olliephillips avatar Oct 13 '17 17:10 olliephillips

I dont think i was correct about the data being duplicated on restart of Ponzu - at least it's not consistent. I have certainly observed that behaviour, but I could not replicate last night when trying to debug this.

olliephillips avatar Oct 24 '17 08:10 olliephillips

Leaving this as a note for now, but I'd like to look more closely into https://github.com/census-instrumentation/opencensus-go as a global (and possibly hookable) tracing / observability system for Ponzu. With this, we could potentially replace the dashboard analytics with a view from OpenCensus, and the use the analytics.db BoltDB as an export target / collector.

I haven't yet done enough research to determine if it's well suited for this or not, but the work so far looks great.

cc/ @ferhatelmas, whom I noticed has made some contributions and may have some hints for us.

nilslice avatar Jan 08 '18 16:01 nilslice