Duplicated date on timeseries
Hello, me again...
I notice that when I retrieve data via Postman, I have dates that are duplicated, without having the same data... I don't know how this happens, maybe the timezone, but not sure.
Query:
https://vince.shikkanime.fr/api/v1/stats/timeseries?site_id=shikkanime.fr&period=6mo&metrics=visitors,visits,pageviews,views_per_visit,bounce_rate,visit_duration
Data returned:
{
"results": [
{
"timestamp": "2024-03-17T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 9,
"views_per_visit": 1.8,
"visit_duration": 75.4816,
"visitors": 5,
"visits": 5
}
},
{
"timestamp": "2024-03-16T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 38,
"views_per_visit": 2.235294117647059,
"visit_duration": 248.4300588235294,
"visitors": 6,
"visits": 17
}
},
{
"timestamp": "2024-03-15T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 23,
"views_per_visit": 2.090909090909091,
"visit_duration": 287.7332727272727,
"visitors": 7,
"visits": 11
}
},
{
"timestamp": "2024-03-14T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 23,
"views_per_visit": 2.090909090909091,
"visit_duration": 165.05236363636362,
"visitors": 9,
"visits": 11
}
},
{
"timestamp": "2024-03-13T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 45,
"views_per_visit": 2.6470588235294117,
"visit_duration": 138.2324705882353,
"visitors": 7,
"visits": 17
}
},
{
"timestamp": "2024-03-14T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 1,
"views_per_visit": 1,
"visit_duration": 0,
"visitors": 1,
"visits": 1
}
},
{
"timestamp": "2024-03-16T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 5,
"views_per_visit": 1.6666666666666667,
"visit_duration": 272.537,
"visitors": 2,
"visits": 3
}
},
{
"timestamp": "2024-03-17T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 1,
"views_per_visit": 1,
"visit_duration": 0,
"visitors": 1,
"visits": 1
}
}
]
}
Thanks for reporting, this is definitely a bug, I will investigate and fix tomorrow.
@Ziedelth sorry I couldn't look into this today, I'm not feeling well.
No problem, take care.
Hi, so I looked into this. You are right, events are stored with timestamps that have local timezone, but we process computed time as UTC.
We have two options
- use
UTCeverywhere, this is a safer and correct way. It is a breaking change though and all the data you collected so far will be lost. - convert each time to UTC before doing any computation. This will be expensive and slow, and potentially error prone.
what do you think?
offtopic: Solo Leveling is fantastic, I am putting it up there almost close to Jujutsu Kaisenn
xD
Hi, so I looked into this. You are right, events are stored with timestamps that have local timezone, but we process computed time as
UTC.We have two options
* use `UTC` everywhere, this is a safer and correct way. It is a breaking change though and all the data you collected so far will be lost. * convert each time to UTC before doing any computation. This will be expensive and slow, and potentially error prone.what do you think?
Can't you convert non-UTC dates at server startup to a standard format without data loss?
It's true that it would be better to use UTC dates to conform to all countries...
. Can't you convert non-UTC dates at server startup to a standard format without data loss?
We can do that. Let me think of something.
Are other api endpoints working as intended ? I want to be sure that only timeseries endpoint is affected.
I think so, it's the only one, I don't have any problems with sources or pages
Awesome, I will think of something tomorrow.
I don't know if this helps, but even though I'm on the same day (10am in France, 8am UTC), I still get the date duplication... Maybe it's not also the timezone?
{
"results": [
{
"timestamp": "2024-03-20T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 14,
"views_per_visit": 1.5555555555555556,
"visit_duration": 174.63133333333332,
"visitors": 7,
"visits": 9
}
},
{
"timestamp": "2024-03-19T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 15,
"views_per_visit": 1.5,
"visit_duration": 345.8795,
"visitors": 4,
"visits": 10
}
},
{
"timestamp": "2024-03-18T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 25,
"views_per_visit": 1.7857142857142858,
"visit_duration": 65.617,
"visitors": 11,
"visits": 14
}
},
{
"timestamp": "2024-03-17T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 39,
"views_per_visit": 1.8571428571428572,
"visit_duration": 157.9424285714286,
"visitors": 15,
"visits": 21
}
},
{
"timestamp": "2024-03-16T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 38,
"views_per_visit": 1.9,
"visit_duration": 211.16555,
"visitors": 6,
"visits": 20
}
},
{
"timestamp": "2024-03-15T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 23,
"views_per_visit": 1.6428571428571428,
"visit_duration": 226.07614285714286,
"visitors": 7,
"visits": 14
}
},
{
"timestamp": "2024-03-14T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 23,
"views_per_visit": 1.6428571428571428,
"visit_duration": 129.684,
"visitors": 9,
"visits": 14
}
},
{
"timestamp": "2024-03-13T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 45,
"views_per_visit": 1.8,
"visit_duration": 93.99807999999999,
"visitors": 7,
"visits": 25
}
},
{
"timestamp": "2024-03-14T00:00:00Z",
"values": {
"bounce_rate": 0,
"pageviews": 1,
"views_per_visit": 0,
"visit_duration": 0,
"visitors": 1,
"visits": 0
}
},
{
"timestamp": "2024-03-16T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 5,
"views_per_visit": 5,
"visit_duration": 817.611,
"visitors": 2,
"visits": 1
}
},
{
"timestamp": "2024-03-17T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 12,
"views_per_visit": 1.7142857142857142,
"visit_duration": 9.038,
"visitors": 4,
"visits": 7
}
},
{
"timestamp": "2024-03-18T00:00:00Z",
"values": {
"bounce_rate": 0,
"pageviews": 2,
"views_per_visit": 0,
"visit_duration": 0,
"visitors": 2,
"visits": 0
}
},
{
"timestamp": "2024-03-19T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 4,
"views_per_visit": 4,
"visit_duration": 236.214,
"visitors": 2,
"visits": 1
}
},
{
"timestamp": "2024-03-20T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 9,
"views_per_visit": 1.8,
"visit_duration": 4.5288,
"visitors": 2,
"visits": 5
}
}
]
}
Interesting, this is helpful indeed. It looks like our bucketing implementation is buggy. This narrows down where I should focus on.
Hi, I updated vince to use local time in everything. Not sure if it solves this problem but it is a step in the right direction on how we handle time.
Please upgrade your container and give it a try.
It looks good for now
It looks good for now
after the upgrade?
Yes, I will wait for tomorrow to test between the timezone
The problem seems to have happened again. My old data is fine on a single date, but it's still happening on today's dates.
{
"results": [
{
"timestamp": "2024-03-21T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 4,
"views_per_visit": 4,
"visit_duration": 59.323,
"visitors": 1,
"visits": 1
}
},
{
"timestamp": "2024-03-13T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 45,
"views_per_visit": 1.0714285714285714,
"visit_duration": 0,
"visitors": 7,
"visits": 42
}
},
{
"timestamp": "2024-03-14T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 24,
"views_per_visit": 1.1428571428571428,
"visit_duration": 0,
"visitors": 9,
"visits": 21
}
},
{
"timestamp": "2024-03-15T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 23,
"views_per_visit": 1.15,
"visit_duration": 9678.446899999999,
"visitors": 7,
"visits": 20
}
},
{
"timestamp": "2024-03-16T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 43,
"views_per_visit": 1.075,
"visit_duration": 2300.2100999999993,
"visitors": 7,
"visits": 40
}
},
{
"timestamp": "2024-03-17T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 51,
"views_per_visit": 1.0625,
"visit_duration": 4612.806375,
"visitors": 17,
"visits": 48
}
},
{
"timestamp": "2024-03-18T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 27,
"views_per_visit": 1.125,
"visit_duration": 18510.507708333334,
"visitors": 12,
"visits": 24
}
},
{
"timestamp": "2024-03-19T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 19,
"views_per_visit": 1.1875,
"visit_duration": 11121.47675,
"visitors": 4,
"visits": 16
}
},
{
"timestamp": "2024-03-20T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 46,
"views_per_visit": 1.069767441860465,
"visit_duration": 6586.031023255815,
"visitors": 9,
"visits": 43
}
},
{
"timestamp": "2024-03-21T00:00:00Z",
"values": {
"bounce_rate": 1,
"pageviews": 6,
"views_per_visit": 2,
"visit_duration": 377.60966666666667,
"visitors": 2,
"visits": 3
}
}
]
}
By the way, is it normal for the json returned by endpoints to be "beautiful"? Minimizing it can improve performance and speed.
By the way, is it normal for the json returned by endpoints to be "beautiful"? Minimizing it can improve performance and speed.
Can you open an issue to ask for this ? It is beautified by accident(I forgot to remove this), an issue helps with tracking changes,
No problem
The problem seems to have happened again. My old data is fine on a single date, but it's still happening on today's dates.
{ "results": [ { "timestamp": "2024-03-21T00:00:00Z", "values": { "bounce_rate": 1, "pageviews": 4, "views_per_visit": 4, "visit_duration": 59.323, "visitors": 1, "visits": 1 } }, { "timestamp": "2024-03-13T00:00:00Z", "values": { "bounce_rate": 1, "pageviews": 45, "views_per_visit": 1.0714285714285714, "visit_duration": 0, "visitors": 7, "visits": 42 } }, { "timestamp": "2024-03-14T00:00:00Z", "values": { "bounce_rate": 1, "pageviews": 24, "views_per_visit": 1.1428571428571428, "visit_duration": 0, "visitors": 9, "visits": 21 } }, { "timestamp": "2024-03-15T00:00:00Z", "values": { "bounce_rate": 1, "pageviews": 23, "views_per_visit": 1.15, "visit_duration": 9678.446899999999, "visitors": 7, "visits": 20 } }, { "timestamp": "2024-03-16T00:00:00Z", "values": { "bounce_rate": 1, "pageviews": 43, "views_per_visit": 1.075, "visit_duration": 2300.2100999999993, "visitors": 7, "visits": 40 } }, { "timestamp": "2024-03-17T00:00:00Z", "values": { "bounce_rate": 1, "pageviews": 51, "views_per_visit": 1.0625, "visit_duration": 4612.806375, "visitors": 17, "visits": 48 } }, { "timestamp": "2024-03-18T00:00:00Z", "values": { "bounce_rate": 1, "pageviews": 27, "views_per_visit": 1.125, "visit_duration": 18510.507708333334, "visitors": 12, "visits": 24 } }, { "timestamp": "2024-03-19T00:00:00Z", "values": { "bounce_rate": 1, "pageviews": 19, "views_per_visit": 1.1875, "visit_duration": 11121.47675, "visitors": 4, "visits": 16 } }, { "timestamp": "2024-03-20T00:00:00Z", "values": { "bounce_rate": 1, "pageviews": 46, "views_per_visit": 1.069767441860465, "visit_duration": 6586.031023255815, "visitors": 9, "visits": 43 } }, { "timestamp": "2024-03-21T00:00:00Z", "values": { "bounce_rate": 1, "pageviews": 6, "views_per_visit": 2, "visit_duration": 377.60966666666667, "visitors": 2, "visits": 3 } } ] }
Any news on the problem?
Hi, I can't really reproduce this locally so It is hard for me to solve. I will try again this week to find ways to reproduce.
@Ziedelth please be patient with with me, I will get back to you as soon as I have something. Meanwhile don't be discouraged to open any issue you encounter.
No problem, take your time, I know what it's like to work alone on a project.
Hi, any news on the problem?
Hi, sorry for the silence.
I have been busy looking for work and at the same time working on a better index storage.
So, be a bit patient, I plan to use the new storage https://github.com/gernest/rbf on vince
Basically
- I will remove distributed stuff from vince: They make code to be complex and tough finding and debugging
- Migrate to roaring bitmap indexes ( saves a lot of space and compute)
Since we store raw events in vince, don't worry about losing data I will provide a simple command to migrate existing data to new store.
As a user everything will still work without interruption, and much better the timestamp related issues will go away since the storage uses quantum indexes.
I'm doing this to help me with maintenance. I am using the new storage with https://github.com/gernest/requiemdb , when I'm happy will move to vince.
Rest assured you will be the first to be notified when it is ready.
Hi @Ziedelth , just wanted to let you now I'm now focusing on the roadmap that will address this.
Can I ask how long have you had vince instance running? It will help me with planning migration path.
Greetings! So, I have an instance that has been running for 1 week without rebooting. When I restart the instance, the data is correctly added together.
But if it's the data you're interested in, the oldest dates back to March 13.
But if it's the data you're interested in, the oldest dates back to March 13.
Thanks, this is what I was interested in.
I am considering grouping events by buckets year, month ,day and hour. So all api calls will be operating with expectation that computation doesn't reflect exact time an event occurred but rather the time bucket in which the event happened.
@Ziedelth what do you think?