mail icon indicating copy to clipboard operation
mail copied to clipboard

Priority inbox doesn't do what it's supposed to do

Open ChristophWurst opened this issue 4 years ago • 18 comments

Expected behavior

PI should help users organize their email into the important ones and the rest. This algorithm is based on ML, so it's a bit of a black box and performs differently depending on the input. Some people say it does not work but that doesn't give us any input on how to iron out the issues.

For those users the PI does more bad than good.

Actual behavior

PI should deliver acceptable results for almost everyone. It's not supposed to be perfect. But it shouldn't be terrible.

Mail app

v1.4+

TODO

  • [ ] Refactor persistence -> replace DB + file system with a versioned memory cache
  • [ ] Finalize my classification work based on a TF-IDF transformer and a KNN classifier

Future work

  • [ ] Add a feature to the classifier to be able to distinguish between user interaction and automated tagging, e.g. user manually assigned or unassigned importance to an email
  • [ ] Investigate if importance flag changes from external clients could be picked up -> check our sync logic
  • [ ] Investigate if training could be done "online" -> depends on Rubix

Context: https://nextcloud.com/blog/nextcloud-mail-introduces-machine-learning-for-priority-inbox/ and https://github.com/nextcloud/mail/issues/3265

ChristophWurst avatar Nov 03 '20 19:11 ChristophWurst

The great debugging

1 – Debug training

Okay, let's look at what might be going wrong on the affected instances/accounts. Let's start with the train account from the CLI that prints details about the training process:

$ php -f occ mail:account:train 1393
[debug] found 10 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 1000 messages of which 737 are important
[debug] data set split into 900 training and 100 validation sets
[debug] classification report: {"recall":0.810126582278481,"precision":0.90140845070422537,"f1Score":0.85333333333333339}
[debug] classifier validated: recall(important)=0.81012658227848, precision(important)=0.90140845070423 f1(important)=0.85333333333333
[debug] classifier 3067 persisted
14MB of memory used

@guzzisti @jasond2020 @dcrobertson01 @Karamelmar @umrath @Ornias1993 @LukaPitamic please help shed some light on this by running occ mail:account:train for your account and post the results here. The output doesn't contain anything sensitive as you can see above.

Let's see if there is a pattern, then I'll suggest the next step.

Thanks everyone :v:

ChristophWurst avatar Nov 03 '20 19:11 ChristophWurst

@guzzisti @jasond2020 @dcrobertson01 @Karamelmar @umrath @Ornias1993 @LukaPitamic did you have time to try this? Do you need any help?

ChristophWurst avatar Nov 09 '20 18:11 ChristophWurst

I get the feeling my instance won't learn ... there are manymany messages but no 'automatic' training - just the few manually flagged "important" and nothing else; here is the CLI output: [debug] found 6 incoming mailbox(es) [debug] found 1 outgoing mailbox(es) [debug] found 1000 messages of which 0 are important [warning] not enough messages to train a classifier 34MB of memory used

maybe something wrong with accessing the data(-base)? But with this output there is no question why the priority box is not working for me... because it does not do anything

jasond2020 avatar Nov 10 '20 12:11 jasond2020

Thanks a lot for your help, @jasond2020. The output looks fine.

just the few manually flagged "important"

Are you sure you have some? The output ways there are 0 important messages We take the 1000 most recent emails – are your important ones perhaps a bit older?

maybe something wrong with accessing the data(-base)? But with this output there is no question why the priority box is not working for me... because it does not do anything

You're right. On the other hand if the numbers printed are correct, the app does what it's supposed to do. No important message -> nothing to learn. So let's find out if there should be important messages.

ChristophWurst avatar Nov 12 '20 08:11 ChristophWurst

Yes, they may be older - older than the last 1000 I don't know, have not counted. I stopped marking them important for tow reasons: too much effort for no noticeable effects (no mails that i would think have the same 'pattern' as the ones i marked manually as important have been marked important automatically); no (otherwise triggered) automated process started to mark (other) mails important. But, ok, i try to make the effort again and start marking mails as important the next days and post the output here afterwwards.

jasond2020 avatar Nov 12 '20 10:11 jasond2020

no (otherwise triggered) automated process started to mark (other) mails important.

Good point actually. There should be a fallback logic with a rule-based importance classification for just this case. You don't have to do all the manual work. I'll see if I can find out why it wouldn't do this for your emails …

But, ok, i try to make the effort again and start marking mails as important the next days and post the output here afterwwards.

Hold up until we know why the rules don't apply.

ChristophWurst avatar Nov 12 '20 10:11 ChristophWurst

I use 7 mailboxes. Here my data from occ mail:account:train

#15 - my large imap mailbox

[debug] found 26 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 1000 messages of which 37 are important
[debug] data set split into 900 training and 100 validation sets
[debug] classification report: {"recall":0,"precision":0,"f1Score":0}
[debug] classifier validated: recall(important)=0, precision(important)=0 f1(important)=0
[debug] classifier 7690 persisted
24MB of memory used

#3

[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 63 messages of which 11 are important
[info] not enough messages to train a classifier
20MB of memory used

#2

[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 2 messages of which 0 are important
[info] not enough messages to train a classifier
20MB of memory used

#18

[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 15 messages of which 0 are important
[info] not enough messages to train a classifier
20MB of memory used

#19

[debug] found 7 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 322 messages of which 36 are important
[debug] data set split into 290 training and 32 validation sets
[debug] classification report: {"recall":1,"precision":1,"f1Score":1}
[debug] classifier validated: recall(important)=1, precision(important)=1 f1(important)=1
[debug] classifier 7691 persisted
22MB of memory used

#20

[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 298 messages of which 86 are important
[debug] data set split into 269 training and 29 validation sets
[debug] classification report: {"recall":1,"precision":0.9655172413793104,"f1Score":0.9824561403508771}
[debug] classifier validated: recall(important)=1, precision(important)=0.96551724137931 f1(important)=0.98245614035088
[debug] classifier 7692 persisted
24MB of memory used

#7

[debug] found 5 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 0 messages of which 0 are important
[info] not enough messages to train a classifier
20MB of memory used

JensKillermann avatar Nov 17 '20 09:11 JensKillermann

Thanks a lot @JensKillermann. For the accounts with not enough messages to train a classifier there isn't much we can do. The ML needs a good amount of data to work reliably, hence there is a threshold of 20 messages minimum that are set as requirement for the ML training. Accounts with less than that will get some generic rules applied to detect some importance.

Account 15 seems to run into overfitting. As in, the classifier learns strong patterns on the input data. This can explain bad performance when that classifier is used to classify new messages.

Account 20 is also close to overfitting but I assume it works a tad more reliable.

The ratio of messages total to important messages might play a role.

ChristophWurst avatar Nov 26 '20 09:11 ChristophWurst

I checked a few other production instances and it turns out that many accounts are in fact overfitting these days. Whoopsie and totally my bad, I should have paid more attention to how this develops. I'll see if I can reproduce it with an account on my development instance because debugging in production is everything but easy.

ChristophWurst avatar Nov 26 '20 10:11 ChristophWurst

Hi - ended up here in search for the possibility to disable Priority Inbox. It has never had any content and I just have a spinner just above "Other" going on forever. For other users with less folders and less historical mail, the spinner disappears, but still only mail in the "Other" category. I have attached some examples on the requested debug log. although I dont know how to map the mailbox number to user, based on the number of inboxes I guess mine is one of the first shown. train.txt

Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.

MrManor avatar Apr 11 '21 19:04 MrManor

Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.

You can disable the annoyance with v1.10. Have a good one :v:

ChristophWurst avatar Jul 05 '21 08:07 ChristophWurst

@ChristophWurst : how do I find the account id ? I tried with the account inbox box id, but that didn't work

mat-m avatar Jul 08 '21 20:07 mat-m

I went to PI, browsed the Other category, and Mark some mails as important. I marked some as unread so they could be in Important and unread.
They do appear in I&U in a new private window. They do not in the current tab, even after a full reload (Ctrl+F5).

If it's another issue, I can open a new one.

mat-m avatar Jul 08 '21 21:07 mat-m

how do I find the account id ?

https://github.com/nextcloud/mail/blob/master/doc/admin.md#get-account-ids

ChristophWurst avatar Jul 09 '21 07:07 ChristophWurst

Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.

You can disable the annoyance with v1.10. Have a good one ✌️

im trying to find the disable option but cant seem to find it. im on 1.10.5

pbanj avatar Nov 29 '21 01:11 pbanj

Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.

You can disable the annoyance with v1.10. Have a good one v

im trying to find the disable option but cant seem to find it. im on 1.10.5

Settings in the bottom left > Automatically classify importance of new email

miaulalala avatar Nov 29 '21 08:11 miaulalala

NC 24.0.1 / Mail 1.13.2

[debug] found 39 incoming mailbox(es) [debug] found 1 outgoing mailbox(es) [debug] found 1000 messages of which 9 are important [info] not enough messages to train a classifier 17MB of memory used

"Automatically classify importance of new email" is unchecked but half on the new mails are still randomly tagged as important !

How to disable automatic classification ?

olivn avatar Jun 08 '22 08:06 olivn