vince icon indicating copy to clipboard operation
vince copied to clipboard

Duplicated date on timeseries

Open Ziedelth opened this issue 1 year ago • 35 comments

Hello, me again...

I notice that when I retrieve data via Postman, I have dates that are duplicated, without having the same data... I don't know how this happens, maybe the timezone, but not sure.

Query: https://vince.shikkanime.fr/api/v1/stats/timeseries?site_id=shikkanime.fr&period=6mo&metrics=visitors,visits,pageviews,views_per_visit,bounce_rate,visit_duration

Data returned:

{
  "results":  [
    {
      "timestamp":  "2024-03-17T00:00:00Z",
      "values":  {
        "bounce_rate":  1,
        "pageviews":  9,
        "views_per_visit":  1.8,
        "visit_duration":  75.4816,
        "visitors":  5,
        "visits":  5
      }
    },
    {
      "timestamp":  "2024-03-16T00:00:00Z",
      "values":  {
        "bounce_rate":  1,
        "pageviews":  38,
        "views_per_visit":  2.235294117647059,
        "visit_duration":  248.4300588235294,
        "visitors":  6,
        "visits":  17
      }
    },
    {
      "timestamp":  "2024-03-15T00:00:00Z",
      "values":  {
        "bounce_rate":  1,
        "pageviews":  23,
        "views_per_visit":  2.090909090909091,
        "visit_duration":  287.7332727272727,
        "visitors":  7,
        "visits":  11
      }
    },
    {
      "timestamp":  "2024-03-14T00:00:00Z",
      "values":  {
        "bounce_rate":  1,
        "pageviews":  23,
        "views_per_visit":  2.090909090909091,
        "visit_duration":  165.05236363636362,
        "visitors":  9,
        "visits":  11
      }
    },
    {
      "timestamp":  "2024-03-13T00:00:00Z",
      "values":  {
        "bounce_rate":  1,
        "pageviews":  45,
        "views_per_visit":  2.6470588235294117,
        "visit_duration":  138.2324705882353,
        "visitors":  7,
        "visits":  17
      }
    },
    {
      "timestamp":  "2024-03-14T00:00:00Z",
      "values":  {
        "bounce_rate":  1,
        "pageviews":  1,
        "views_per_visit":  1,
        "visit_duration":  0,
        "visitors":  1,
        "visits":  1
      }
    },
    {
      "timestamp":  "2024-03-16T00:00:00Z",
      "values":  {
        "bounce_rate":  1,
        "pageviews":  5,
        "views_per_visit":  1.6666666666666667,
        "visit_duration":  272.537,
        "visitors":  2,
        "visits":  3
      }
    },
    {
      "timestamp":  "2024-03-17T00:00:00Z",
      "values":  {
        "bounce_rate":  1,
        "pageviews":  1,
        "views_per_visit":  1,
        "visit_duration":  0,
        "visitors":  1,
        "visits":  1
      }
    }
  ]
}

Ziedelth avatar Mar 17 '24 10:03 Ziedelth

Thanks for reporting, this is definitely a bug, I will investigate and fix tomorrow.

gernest avatar Mar 17 '24 11:03 gernest

@Ziedelth sorry I couldn't look into this today, I'm not feeling well.

gernest avatar Mar 18 '24 16:03 gernest

No problem, take care.

Ziedelth avatar Mar 18 '24 16:03 Ziedelth

Hi, so I looked into this. You are right, events are stored with timestamps that have local timezone, but we process computed time as UTC.

We have two options

  • use UTC everywhere, this is a safer and correct way. It is a breaking change though and all the data you collected so far will be lost.
  • convert each time to UTC before doing any computation. This will be expensive and slow, and potentially error prone.

what do you think?

gernest avatar Mar 19 '24 15:03 gernest

offtopic: Solo Leveling is fantastic, I am putting it up there almost close to Jujutsu Kaisenn

gernest avatar Mar 19 '24 16:03 gernest

xD

Ziedelth avatar Mar 19 '24 16:03 Ziedelth

Hi, so I looked into this. You are right, events are stored with timestamps that have local timezone, but we process computed time as UTC.

We have two options

* use `UTC` everywhere, this is a safer and correct way. It is a breaking change though and all the data you collected so far will be lost.

* convert each time to UTC before doing any computation.  This will be expensive and slow, and potentially error prone.

what do you think?

Can't you convert non-UTC dates at server startup to a standard format without data loss?

It's true that it would be better to use UTC dates to conform to all countries...

Ziedelth avatar Mar 19 '24 16:03 Ziedelth

. Can't you convert non-UTC dates at server startup to a standard format without data loss?

We can do that. Let me think of something.

Are other api endpoints working as intended ? I want to be sure that only timeseries endpoint is affected.

gernest avatar Mar 19 '24 16:03 gernest

I think so, it's the only one, I don't have any problems with sources or pages

Ziedelth avatar Mar 19 '24 16:03 Ziedelth

Awesome, I will think of something tomorrow.

gernest avatar Mar 19 '24 16:03 gernest

I don't know if this helps, but even though I'm on the same day (10am in France, 8am UTC), I still get the date duplication... Maybe it's not also the timezone?

{
    "results": [
        {
            "timestamp": "2024-03-20T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 14,
                "views_per_visit": 1.5555555555555556,
                "visit_duration": 174.63133333333332,
                "visitors": 7,
                "visits": 9
            }
        },
        {
            "timestamp": "2024-03-19T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 15,
                "views_per_visit": 1.5,
                "visit_duration": 345.8795,
                "visitors": 4,
                "visits": 10
            }
        },
        {
            "timestamp": "2024-03-18T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 25,
                "views_per_visit": 1.7857142857142858,
                "visit_duration": 65.617,
                "visitors": 11,
                "visits": 14
            }
        },
        {
            "timestamp": "2024-03-17T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 39,
                "views_per_visit": 1.8571428571428572,
                "visit_duration": 157.9424285714286,
                "visitors": 15,
                "visits": 21
            }
        },
        {
            "timestamp": "2024-03-16T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 38,
                "views_per_visit": 1.9,
                "visit_duration": 211.16555,
                "visitors": 6,
                "visits": 20
            }
        },
        {
            "timestamp": "2024-03-15T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 23,
                "views_per_visit": 1.6428571428571428,
                "visit_duration": 226.07614285714286,
                "visitors": 7,
                "visits": 14
            }
        },
        {
            "timestamp": "2024-03-14T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 23,
                "views_per_visit": 1.6428571428571428,
                "visit_duration": 129.684,
                "visitors": 9,
                "visits": 14
            }
        },
        {
            "timestamp": "2024-03-13T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 45,
                "views_per_visit": 1.8,
                "visit_duration": 93.99807999999999,
                "visitors": 7,
                "visits": 25
            }
        },
        {
            "timestamp": "2024-03-14T00:00:00Z",
            "values": {
                "bounce_rate": 0,
                "pageviews": 1,
                "views_per_visit": 0,
                "visit_duration": 0,
                "visitors": 1,
                "visits": 0
            }
        },
        {
            "timestamp": "2024-03-16T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 5,
                "views_per_visit": 5,
                "visit_duration": 817.611,
                "visitors": 2,
                "visits": 1
            }
        },
        {
            "timestamp": "2024-03-17T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 12,
                "views_per_visit": 1.7142857142857142,
                "visit_duration": 9.038,
                "visitors": 4,
                "visits": 7
            }
        },
        {
            "timestamp": "2024-03-18T00:00:00Z",
            "values": {
                "bounce_rate": 0,
                "pageviews": 2,
                "views_per_visit": 0,
                "visit_duration": 0,
                "visitors": 2,
                "visits": 0
            }
        },
        {
            "timestamp": "2024-03-19T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 4,
                "views_per_visit": 4,
                "visit_duration": 236.214,
                "visitors": 2,
                "visits": 1
            }
        },
        {
            "timestamp": "2024-03-20T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 9,
                "views_per_visit": 1.8,
                "visit_duration": 4.5288,
                "visitors": 2,
                "visits": 5
            }
        }
    ]
}

Ziedelth avatar Mar 20 '24 08:03 Ziedelth

Interesting, this is helpful indeed. It looks like our bucketing implementation is buggy. This narrows down where I should focus on.

gernest avatar Mar 20 '24 09:03 gernest

Hi, I updated vince to use local time in everything. Not sure if it solves this problem but it is a step in the right direction on how we handle time.

Please upgrade your container and give it a try.

gernest avatar Mar 21 '24 09:03 gernest

It looks good for now

Ziedelth avatar Mar 21 '24 09:03 Ziedelth

It looks good for now

after the upgrade?

gernest avatar Mar 21 '24 09:03 gernest

Yes, I will wait for tomorrow to test between the timezone

Ziedelth avatar Mar 21 '24 09:03 Ziedelth

The problem seems to have happened again. My old data is fine on a single date, but it's still happening on today's dates.

{
    "results": [
        {
            "timestamp": "2024-03-21T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 4,
                "views_per_visit": 4,
                "visit_duration": 59.323,
                "visitors": 1,
                "visits": 1
            }
        },
        {
            "timestamp": "2024-03-13T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 45,
                "views_per_visit": 1.0714285714285714,
                "visit_duration": 0,
                "visitors": 7,
                "visits": 42
            }
        },
        {
            "timestamp": "2024-03-14T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 24,
                "views_per_visit": 1.1428571428571428,
                "visit_duration": 0,
                "visitors": 9,
                "visits": 21
            }
        },
        {
            "timestamp": "2024-03-15T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 23,
                "views_per_visit": 1.15,
                "visit_duration": 9678.446899999999,
                "visitors": 7,
                "visits": 20
            }
        },
        {
            "timestamp": "2024-03-16T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 43,
                "views_per_visit": 1.075,
                "visit_duration": 2300.2100999999993,
                "visitors": 7,
                "visits": 40
            }
        },
        {
            "timestamp": "2024-03-17T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 51,
                "views_per_visit": 1.0625,
                "visit_duration": 4612.806375,
                "visitors": 17,
                "visits": 48
            }
        },
        {
            "timestamp": "2024-03-18T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 27,
                "views_per_visit": 1.125,
                "visit_duration": 18510.507708333334,
                "visitors": 12,
                "visits": 24
            }
        },
        {
            "timestamp": "2024-03-19T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 19,
                "views_per_visit": 1.1875,
                "visit_duration": 11121.47675,
                "visitors": 4,
                "visits": 16
            }
        },
        {
            "timestamp": "2024-03-20T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 46,
                "views_per_visit": 1.069767441860465,
                "visit_duration": 6586.031023255815,
                "visitors": 9,
                "visits": 43
            }
        },
        {
            "timestamp": "2024-03-21T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 6,
                "views_per_visit": 2,
                "visit_duration": 377.60966666666667,
                "visitors": 2,
                "visits": 3
            }
        }
    ]
}

Ziedelth avatar Mar 21 '24 13:03 Ziedelth

By the way, is it normal for the json returned by endpoints to be "beautiful"? Minimizing it can improve performance and speed.

Ziedelth avatar Mar 21 '24 13:03 Ziedelth

By the way, is it normal for the json returned by endpoints to be "beautiful"? Minimizing it can improve performance and speed.

Can you open an issue to ask for this ? It is beautified by accident(I forgot to remove this), an issue helps with tracking changes,

gernest avatar Mar 21 '24 13:03 gernest

No problem

Ziedelth avatar Mar 21 '24 13:03 Ziedelth

The problem seems to have happened again. My old data is fine on a single date, but it's still happening on today's dates.

{
    "results": [
        {
            "timestamp": "2024-03-21T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 4,
                "views_per_visit": 4,
                "visit_duration": 59.323,
                "visitors": 1,
                "visits": 1
            }
        },
        {
            "timestamp": "2024-03-13T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 45,
                "views_per_visit": 1.0714285714285714,
                "visit_duration": 0,
                "visitors": 7,
                "visits": 42
            }
        },
        {
            "timestamp": "2024-03-14T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 24,
                "views_per_visit": 1.1428571428571428,
                "visit_duration": 0,
                "visitors": 9,
                "visits": 21
            }
        },
        {
            "timestamp": "2024-03-15T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 23,
                "views_per_visit": 1.15,
                "visit_duration": 9678.446899999999,
                "visitors": 7,
                "visits": 20
            }
        },
        {
            "timestamp": "2024-03-16T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 43,
                "views_per_visit": 1.075,
                "visit_duration": 2300.2100999999993,
                "visitors": 7,
                "visits": 40
            }
        },
        {
            "timestamp": "2024-03-17T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 51,
                "views_per_visit": 1.0625,
                "visit_duration": 4612.806375,
                "visitors": 17,
                "visits": 48
            }
        },
        {
            "timestamp": "2024-03-18T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 27,
                "views_per_visit": 1.125,
                "visit_duration": 18510.507708333334,
                "visitors": 12,
                "visits": 24
            }
        },
        {
            "timestamp": "2024-03-19T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 19,
                "views_per_visit": 1.1875,
                "visit_duration": 11121.47675,
                "visitors": 4,
                "visits": 16
            }
        },
        {
            "timestamp": "2024-03-20T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 46,
                "views_per_visit": 1.069767441860465,
                "visit_duration": 6586.031023255815,
                "visitors": 9,
                "visits": 43
            }
        },
        {
            "timestamp": "2024-03-21T00:00:00Z",
            "values": {
                "bounce_rate": 1,
                "pageviews": 6,
                "views_per_visit": 2,
                "visit_duration": 377.60966666666667,
                "visitors": 2,
                "visits": 3
            }
        }
    ]
}

Any news on the problem?

Ziedelth avatar Mar 25 '24 07:03 Ziedelth

Hi, I can't really reproduce this locally so It is hard for me to solve. I will try again this week to find ways to reproduce.

gernest avatar Mar 25 '24 09:03 gernest

@Ziedelth please be patient with with me, I will get back to you as soon as I have something. Meanwhile don't be discouraged to open any issue you encounter.

gernest avatar Mar 25 '24 16:03 gernest

No problem, take your time, I know what it's like to work alone on a project.

Ziedelth avatar Mar 25 '24 16:03 Ziedelth

Hi, any news on the problem?

Ziedelth avatar Apr 26 '24 05:04 Ziedelth

Hi, sorry for the silence.

I have been busy looking for work and at the same time working on a better index storage.

So, be a bit patient, I plan to use the new storage https://github.com/gernest/rbf on vince

Basically

  • I will remove distributed stuff from vince: They make code to be complex and tough finding and debugging
  • Migrate to roaring bitmap indexes ( saves a lot of space and compute)

Since we store raw events in vince, don't worry about losing data I will provide a simple command to migrate existing data to new store.

As a user everything will still work without interruption, and much better the timestamp related issues will go away since the storage uses quantum indexes.

I'm doing this to help me with maintenance. I am using the new storage with https://github.com/gernest/requiemdb , when I'm happy will move to vince.

Rest assured you will be the first to be notified when it is ready.

gernest avatar Apr 26 '24 13:04 gernest

Hi @Ziedelth , just wanted to let you now I'm now focusing on the roadmap that will address this.

Can I ask how long have you had vince instance running? It will help me with planning migration path.

gernest avatar May 02 '24 18:05 gernest

Greetings! So, I have an instance that has been running for 1 week without rebooting. When I restart the instance, the data is correctly added together.

But if it's the data you're interested in, the oldest dates back to March 13.

Ziedelth avatar May 02 '24 18:05 Ziedelth

But if it's the data you're interested in, the oldest dates back to March 13.

Thanks, this is what I was interested in.

gernest avatar May 02 '24 18:05 gernest

I am considering grouping events by buckets year, month ,day and hour. So all api calls will be operating with expectation that computation doesn't reflect exact time an event occurred but rather the time bucket in which the event happened.

@Ziedelth what do you think?

gernest avatar May 03 '24 15:05 gernest