incubator-ponymail icon indicating copy to clipboard operation
incubator-ponymail copied to clipboard

Bug: emails allocated to wrong month

Open sebbASF opened this issue 9 years ago • 7 comments

Emails which are sent within a few hours of the end of a month may be incorrectly allocated to the subsequent month.

For example, the earliest emails in the following mbox should be in the May 2015 chunk.

https://lists.apache.org/[email protected]:2015-6

The displayed timestamp shows the date 2015-05-31 (assuming that the local TZ is no further east than GMT+1)

These mails were uploaded from a file, if that makes a difference.

It looks like the mails were allocated to a month based on a local timezone at least 2 hours before GMT. It does not make sense to use the local timezone for this. The database should work in UTC only. If necessary, the display can show times using the local timezone, but the underlying data should only be stored in UTC.

I found one place where localtime is used in the backend code:

https://github.com/apache/incubator-ponymail/blob/master/tools/archiver.py#L274

This would probably cause the upload issue.

See also https://issues.apache.org/jira/browse/INFRA-12079

sebbASF avatar Jun 11 '16 14:06 sebbASF

I also found a non-uploaded mail that has been allocated to the wrong month:

https://lists.apache.org/thread.html/79e2e6a0df70efc206e8e0124bd52d0302c52b50775d5aaa2cff108d@1464733997@%3Cuser.commons.apache.org%3E

The date in the email is

Date: Tue, 31 May 2016 22:33:17 +0000

sebbASF avatar Jun 11 '16 14:06 sebbASF

The archiver code now uses UTC

sebbASF avatar Nov 09 '16 23:11 sebbASF

Example of an early imported mail.

The source [1] has the following date:

Date: Sun, 31 May 2015 22:19:41 -0000

The Permalink page [2] has the following info:

Date: 2015-05-31 23:19 (-0000)

This is clearly wrong, but may be a GUI-only issue [Later: yes, the problem is that the GUI was converting the time to a local time; this has been fixed]

The summary info [3] shows the following:

mid": "fba1fa838d345c3b30b3db543425419a85ffde5f89ed2278063cf0c6@1433110781@<notifications.commons.apache.org>", "date": "2015/06/01 00:19:41", epoch": 1433110781,

The epoch value corresponds to 2015-05-31 22:19:41 UTC

So the epoch agrees with the source mail. The date in the mbox record is two hours adrift, and is the reason why the message appears in the wrong month. [Later: this implies that the local TZ on the importing box was 2 hours different from UTC at the time]

[1] https://lists.apache.org/api/source.lua/fba1fa838d345c3b30b3db543425419a85ffde5f89ed2278063cf0c6@1433110781@%3Cnotifications.commons.apache.org%3E

[2] https://lists.apache.org/thread.html/fba1fa838d345c3b30b3db543425419a85ffde5f89ed2278063cf0c6@1433110781@%3Cnotifications.commons.apache.org%3E

[3] https://lists.apache.org/api/thread.lua?id=fba1fa838d345c3b30b3db543425419a85ffde5f89ed2278063cf0c6@1433110781@%3Cnotifications.commons.apache.org%3E

sebbASF avatar Nov 25 '16 16:11 sebbASF

AFAIK, you can tell ES a timezone offset to correct this when querying for email.

Humbedooh avatar Nov 25 '16 16:11 Humbedooh

That won't help, because the TZ which was used for the date field is not included in the string (if it were, this would be a non-issue). Also the TZ used to load the original mails is not the same as the TZ which is used now. Unless one knows the TZ one cannot tell ES what offset to use.

I think the code should use the epoch instead. Hopefully that always used UTC, but that needs to be checked.

However there remains the issue that the date fields in the mbox records use different TZs depending on when they were created. One solution might be to ignore them completely.

sebbASF avatar Nov 25 '16 16:11 sebbASF

It looks as though the problem is fixed in the current code, because importing the same message generates the correct UTC date, i.e. "2015/05/31 22:19:41".

sebbASF avatar Dec 17 '16 13:12 sebbASF

Note: it's not easy to use the 'epoch' field instead of the 'date' field, because the code makes extensive use of the relative date syntax supported by ES, e.g. +1m, -100d etc. This would be hard to match exactly in Lua. Also the problem now only exists for database entries that were created before the code was fixed.

sebbASF avatar Jan 18 '17 11:01 sebbASF