analytics icon indicating copy to clipboard operation
analytics copied to clipboard

Statistics relative to previous period are misleading

Open RealOrangeOne opened this issue 3 years ago • 9 comments

Past Issues Searched

  • [X] I have searched open and closed issues to make sure that the bug has not yet been reported

Issue is a Bug Report

  • [X] This is a bug report and not a feature request, nor asking for self-hosted support

Describe the bug

On the dashboard, under "Unique visitors", is a percentage showing how today's numbers compare to yesterday's.

Unfortunately, it seems these are based on the entire stats of the previous day, rather than just those at the current time. At least, I think that's the case.

Expected behavior

These "previous period" statistics take into account how far through the current period we are.

For example, when looking at the "Day" statistics, and it's 4pm, it should only consider stats for yesterday up to 4pm, so the data is actually comparable.

Otherwise, the values aren't exceptionally useful unless going through extreme growth / decline in traffic.

Screenshots

image

Environment

N/A

RealOrangeOne avatar Jan 30 '22 17:01 RealOrangeOne

thanks for reporting @RealOrangeOne! yes, this is correct for the "today" view. there's a thread on this here https://github.com/plausible/analytics/discussions/344

metmarkosaric avatar Jan 31 '22 06:01 metmarkosaric

Hey ho 👋

I just had a quick conversation with @ukutaht about contributing to Plausible, and I feel that this is an excellent first issue for me to get my feet wet.

Therefore, I'll start tackling this (probably this weekend) unless somebody else has already started with it.

BTW feel free to assign it to me if you want.

Cheers, Chris

ccmaris avatar Feb 09 '22 15:02 ccmaris

oh that's nice to hear @ccmaris! would be a great improvement to Plausible to standardize comparisons so they're compared to exact same amount of time. thanks for your contribution!

metmarkosaric avatar Feb 09 '22 17:02 metmarkosaric

Hey there 👋 This would be tricky to tackle because the date range passed to the query doesn't specify the exact hour/time. Therefore, I can see two possible solutions:

  1. Expand the query.ex date_range attribute in a backward-compatible way that allows it to also include hours (big but proper change).
  2. Add a catch after the query is dispatched in the part of the codebase that satisfies the query so that it adapts in case the date_range is a single day (hacky and error-prone).

Which solution would you guys prefer? And do you have any further suggestions here that I might have missed?

ccmaris avatar Mar 27 '22 12:03 ccmaris

You're right, it's quite a bit trickier than I initially thought. I'm leaning towards option 2 because:

  1. Changing the query.ex is probably a bigger rabbit hole than it might seem at first :)
  2. We already do override some dates when actually querying the database to have more fine-grained control for realtime stats. It wouldn't be completely new.

I think adding another clause here that limits the query to the current hour of the day (when period=day) would be the most minimal change. As long as there's a test case for it I'm not worried about the error-proneness.

We may need to rethink the Query struct in the future but I would also look at the other exceptions we have made in that case and approach it holistically.

ukutaht avatar Mar 28 '22 08:03 ukutaht

I had a look and think that this is tackled now ?! I also didn't have as much free time as I thought :/

ccmaris avatar Jun 05 '22 19:06 ccmaris

@ukutaht Is something like this what you had in mind?

In lib/plausible/stats/base.ex:

  def utc_boundaries(%Query{period: "day"}, _timezone) do
    last_datetime = NaiveDateTime.utc_now() |> Timex.shift(seconds: 5)
    first_datetime = NaiveDateTime.utc_now() |> Timex.shift(days: -1)

    {first_datetime, last_datetime}
  end

djanda97 avatar Jan 14 '23 01:01 djanda97

Sorry no, I don't think this will work.

I think we need to add another case to to the shift_back function in query.ex: https://github.com/plausible/analytics/blob/2a4061de8c6fcdfbbdd998981e43c73328e8dbfb/lib/plausible/stats/query.ex#L36

We already adjust the previous period for month and year ranges to make sure they only include a comparable time period from the previous one. Another clause needs to be added for the day period to make sure we include the same number of hours as are shown for the current query period.

ukutaht avatar Jan 17 '23 14:01 ukutaht

Ah I see, that makes sense. The shift_back function looks like it handles the month and year periods similarly, so I'll try adding another case for the day period.

djanda97 avatar Jan 28 '23 04:01 djanda97