matomo icon indicating copy to clipboard operation
matomo copied to clipboard

[Bug] Unexpected behaviour for "core:archive --force-date-range=yesterday,yesterday"

Open mneudert opened this issue 7 months ago • 0 comments

What happened?

tldr: Running core:archive --force-date-range=yesterday,yesterday may not process the expected day, if archiving is running before midnight UTC and the website's local time is past midnight.


Let's assume a crontab for Matomo is set up with two entries:

30 11 * * * /srv/matomo/console core:archive --force-idsites=1 --skip-segments-today
30 23 * * * /srv/matomo/console core:archive --force-idsites=1 --force-date-range=yesterday,yesterday

The server is running in UTC, the website is configured as UTC+1 (e.g. Europe/Berlin during CET).

Times in the crontab were chosen with the intent to archive the current day fast (without complex segments). The segments should be processed right after midnight in the website's timezone (23:30 UTC is 00:30 UTC+1), including a reprocessing of the previous day's regular archives (that were initially created at 11:30 UTC). We do not want to process any invalidations, except those for the recently finished day.

Though archiving logs show the nightly archiving is not working as expected:

INFO      [2024-06-24 23:30:39] 8635  Start processing archives for site 1.
DEBUG     [2024-06-24 23:30:39] 8635  Checking for queued invalidations...
DEBUG     [2024-06-24 23:30:39] 8635    Today archive can be skipped due to no visits for idSite = 1, skipping invalidation...
INFO      [2024-06-24 23:30:39] 8635    Will invalidate archived reports for yesterday in site ID = 1's timezone (2024-06-24 00:00:00).
DEBUG     [2024-06-24 23:30:39] 8635    Found usable archive for [idSite = 1, period = day 2024-06-24,2024-06-24, segment = , plugin = , report = ], skipping invalidation.
DEBUG     [2024-06-24 23:30:39] 8635  process_new_segments_from set to beginning_of_time or cannot recognize value
DEBUG     [2024-06-24 23:30:39] 8635  Done invalidating
DEBUG     [2024-06-24 23:30:39] 8635  Skipping invalidated archive 172, archive date range (2024-06-24,2024-06-24) is not within --force-date-range: [idinvalidation = 172, idsite = 1, period = day(2024-06-24 - 2024-06-24), name = done7d2af5b32be3cc649b35daaa6e947151, segment = countryName==United%2520States]
DEBUG     [2024-06-24 23:30:39] 8635  Skipping invalidated archive 176, archive date range (2024-06-24,2024-06-30) is not within --force-date-range: [idinvalidation = 176, idsite = 1, period = week(2024-06-24 - 2024-06-30), name = done7d2af5b32be3cc649b35daaa6e947151, segment = countryName==United%2520States]
DEBUG     [2024-06-24 23:30:40] 8635  No usable archive exists (ts_archived of existing = 2024-06-24 11:30:00, now = 2024-06-24 23:30:00).
DEBUG     [2024-06-24 23:30:40] 8635  process_new_segments_from set to beginning_of_time or cannot recognize value
DEBUG     [2024-06-24 23:30:40] 8635  Processing invalidation: [idinvalidation = 177, idsite = 1, period = month(2024-06-01 - 2024-06-30), name = done7d2af5b32be3cc649b35daaa6e947151, segment = countryName==United%2520States].
DEBUG     [2024-06-24 23:30:40] 8635  Found archive with intersecting period with others in concurrent batch, skipping until next batch: [idinvalidation = 179, idsite = 1, period = year(2024-01-01 - 2024-12-31), name = done7d2af5b32be3cc649b35daaa6e947151, segment = countryName==United%2520States]
DEBUG     [2024-06-24 23:30:40] 8635  No next invalidated archive.
DEBUG     [2024-06-24 23:30:40] 8635  process_new_segments_from set to beginning_of_time or cannot recognize value
DEBUG     [2024-06-24 23:30:40] 8635  Starting archiving for ?module=API&method=CoreAdminHome.archiveReports&idSite=101&period=month&date=2024-06-01&format=json&segment=countryName%3D%3DUnited%252520States&trigger=archivephp
DEBUG     [2024-06-24 23:30:40] 8635  Running command: /usr/bin/php7.2 -q  /srv/matomo/console climulti:request -q --matomo-domain='' --superuser 'module=API&method=CoreAdminHome.archiveReports&idSite=101&period=month&date=2024-06-01&format=json&segment=countryName%3D%3DUnited%252520States&trigger=archivephp' 2>&1
INFO      [2024-06-24 23:30:41] 8635  Archived website id 1, period = month, date = 2024-06-01, segment = 'countryName==United%20States', 1 visits found. Time elapsed: 1.459s
DEBUG     [2024-06-24 23:30:41] 8635  No usable archive exists (ts_archived of existing = 2024-06-24 11:30:00, now = 2024-06-24 23:30:00).
DEBUG     [2024-06-24 23:30:41] 8635  process_new_segments_from set to beginning_of_time or cannot recognize value
DEBUG     [2024-06-24 23:30:41] 8635  Processing invalidation: [idinvalidation = 179, idsite = 1, period = year(2024-01-01 - 2024-12-31), name = done7d2af5b32be3cc649b35daaa6e947151, segment = countryName==United%2520States].
DEBUG     [2024-06-24 23:30:41] 8635  No next invalidated archive.
DEBUG     [2024-06-24 23:30:41] 8635  process_new_segments_from set to beginning_of_time or cannot recognize value
DEBUG     [2024-06-24 23:30:41] 8635  Starting archiving for ?module=API&method=CoreAdminHome.archiveReports&idSite=101&period=year&date=2024-01-01&format=json&segment=countryName%3D%3DUnited%252520States&trigger=archivephp
DEBUG     [2024-06-24 23:30:41] 8635  Running command: /usr/bin/php7.2 -q  /srv/matomo/console climulti:request -q --matomo-domain='' --superuser 'module=API&method=CoreAdminHome.archiveReports&idSite=101&period=year&date=2024-01-01&format=json&segment=countryName%3D%3DUnited%252520States&trigger=archivephp' 2>&1
INFO      [2024-06-24 23:30:43] 8635  Archived website id 1, period = year, date = 2024-01-01, segment = 'countryName==United%20States', 1 visits found. Time elapsed: 1.713s
DEBUG     [2024-06-24 23:30:43] 8635  No next invalidated archive.
INFO      [2024-06-24 23:30:43] 8635  Finished archiving for site 1, 2 API requests, Time elapsed: 3.675s [1 / 1 done]
DEBUG     [2024-06-24 23:30:43] 8635  No more sites left to archive, stopping.
INFO      [2024-06-24 23:30:43] 8635  Done archiving!

Several odd things are happening here.

  1. Missing archiving of the empty segment (separate issue)

In above logs, there is no output for the empty segment. This happens due to an issue with time comparisons, the specific time this archiving process was executed.

This problem will be fixed in #22299.

  1. Archive date range is not in --force-date-range (this issue)

A message was printed that "2024-06-24,2024-06-24" is not within the requested date range "2024-06-24,2024-06-24".

This happens in the archive filter, not taking the timezone of the website into account:

https://github.com/matomo-org/matomo/blob/9a3ef94df61682e7d543f3f8b438c368f6eb71a6/core/CronArchive/ArchiveFilter.php#L101-L102

Passing "yesterday" to the date range argument is interpreted as "yesterday relative to 23:30 UTC", not relative to the website being processed. With a positive UTC offset this results in different days being used, and the odd log message.

  1. Segments still being processed but also invalidated (maybe this issue)

Running archiving with --skip-segments-today will create invalidation entries for the segments that are not processed. The archive filter in the current special "yesterday" format will ignore those due to the timezone problem.

Afterwards the regular archiving will detect that the segments have not been archived yet, and process them.

However, the invalidations skipped in the archive filter seem to NOT be deleted. They will be picked up by the next archiving run, even though the archive itself should already be complete and not require processing anymore.

  1. Archive date range is not in --force-date-range (this issue, after applying #22299)

After applying mentioned fix, the timezone issue in the archive filter will be logged for the empty segment:

DEBUG [2024-06-24 23:31:13] 9198 Skipping invalidated archive 259, archive date range (2024-06-24,2024-06-24) is not within --force-date-range: [idinvalidation = 259, idsite = 1, period = day(2024-06-24 - 2024-06-24), name = done, segment = ]

But, as the issue of "yesterday may not be reprocessed" is fixed, and the day in the website's timezone changed, we still get correct empty segment archives for the day.

Matching point 3, some segment invalidation entries seem to NOT be deleted after being skipped in the archive filter.

What should happen?

Passing relative dates, like "today" or "yesterday", to core:archive --force-date-range should be handled in relation to the website being processed.

How can this be reproduced?

  • Create a website with a positive UTC offset
  • Run archiving during the day to create "today" archives
  • Run archiving with --force-date-range=yesterday,yesterday during the period affected by the timezone problems, e.g. 23:30 UTC for a website in UTC+1

Maybe:

  • Repeat steps after applying #22299

Matomo version

5.1.0-alpha

PHP version

No response

Server operating system

No response

What browsers are you seeing the problem on?

No response

Computer operating system

No response

Relevant log output

No response

Validations

mneudert avatar Jun 26 '24 17:06 mneudert