mail
mail copied to clipboard
Priority inbox doesn't do what it's supposed to do
Expected behavior
PI should help users organize their email into the important ones and the rest. This algorithm is based on ML, so it's a bit of a black box and performs differently depending on the input. Some people say it does not work but that doesn't give us any input on how to iron out the issues.
For those users the PI does more bad than good.
Actual behavior
PI should deliver acceptable results for almost everyone. It's not supposed to be perfect. But it shouldn't be terrible.
Mail app
v1.4+
TODO
- [ ] Refactor persistence -> replace DB + file system with a versioned memory cache
- [ ] Finalize my classification work based on a TF-IDF transformer and a KNN classifier
Future work
- [ ] Add a feature to the classifier to be able to distinguish between user interaction and automated tagging, e.g. user manually assigned or unassigned importance to an email
- [ ] Investigate if importance flag changes from external clients could be picked up -> check our sync logic
- [ ] Investigate if training could be done "online" -> depends on Rubix
Context: https://nextcloud.com/blog/nextcloud-mail-introduces-machine-learning-for-priority-inbox/ and https://github.com/nextcloud/mail/issues/3265
The great debugging
1 – Debug training
Okay, let's look at what might be going wrong on the affected instances/accounts. Let's start with the train account from the CLI that prints details about the training process:
$ php -f occ mail:account:train 1393
[debug] found 10 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 1000 messages of which 737 are important
[debug] data set split into 900 training and 100 validation sets
[debug] classification report: {"recall":0.810126582278481,"precision":0.90140845070422537,"f1Score":0.85333333333333339}
[debug] classifier validated: recall(important)=0.81012658227848, precision(important)=0.90140845070423 f1(important)=0.85333333333333
[debug] classifier 3067 persisted
14MB of memory used
@guzzisti @jasond2020 @dcrobertson01 @Karamelmar @umrath @Ornias1993 @LukaPitamic please help shed some light on this by running occ mail:account:train
for your account and post the results here. The output doesn't contain anything sensitive as you can see above.
Let's see if there is a pattern, then I'll suggest the next step.
Thanks everyone :v:
@guzzisti @jasond2020 @dcrobertson01 @Karamelmar @umrath @Ornias1993 @LukaPitamic did you have time to try this? Do you need any help?
I get the feeling my instance won't learn ... there are manymany messages but no 'automatic' training - just the few manually flagged "important" and nothing else; here is the CLI output: [debug] found 6 incoming mailbox(es) [debug] found 1 outgoing mailbox(es) [debug] found 1000 messages of which 0 are important [warning] not enough messages to train a classifier 34MB of memory used
maybe something wrong with accessing the data(-base)? But with this output there is no question why the priority box is not working for me... because it does not do anything
Thanks a lot for your help, @jasond2020. The output looks fine.
just the few manually flagged "important"
Are you sure you have some? The output ways there are 0
important messages We take the 1000 most recent emails – are your important ones perhaps a bit older?
maybe something wrong with accessing the data(-base)? But with this output there is no question why the priority box is not working for me... because it does not do anything
You're right. On the other hand if the numbers printed are correct, the app does what it's supposed to do. No important message -> nothing to learn. So let's find out if there should be important messages.
Yes, they may be older - older than the last 1000 I don't know, have not counted. I stopped marking them important for tow reasons: too much effort for no noticeable effects (no mails that i would think have the same 'pattern' as the ones i marked manually as important have been marked important automatically); no (otherwise triggered) automated process started to mark (other) mails important. But, ok, i try to make the effort again and start marking mails as important the next days and post the output here afterwwards.
no (otherwise triggered) automated process started to mark (other) mails important.
Good point actually. There should be a fallback logic with a rule-based importance classification for just this case. You don't have to do all the manual work. I'll see if I can find out why it wouldn't do this for your emails …
But, ok, i try to make the effort again and start marking mails as important the next days and post the output here afterwwards.
Hold up until we know why the rules don't apply.
I use 7 mailboxes. Here my data from occ mail:account:train
#15 - my large imap mailbox
[debug] found 26 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 1000 messages of which 37 are important
[debug] data set split into 900 training and 100 validation sets
[debug] classification report: {"recall":0,"precision":0,"f1Score":0}
[debug] classifier validated: recall(important)=0, precision(important)=0 f1(important)=0
[debug] classifier 7690 persisted
24MB of memory used
#3
[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 63 messages of which 11 are important
[info] not enough messages to train a classifier
20MB of memory used
#2
[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 2 messages of which 0 are important
[info] not enough messages to train a classifier
20MB of memory used
#18
[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 15 messages of which 0 are important
[info] not enough messages to train a classifier
20MB of memory used
#19
[debug] found 7 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 322 messages of which 36 are important
[debug] data set split into 290 training and 32 validation sets
[debug] classification report: {"recall":1,"precision":1,"f1Score":1}
[debug] classifier validated: recall(important)=1, precision(important)=1 f1(important)=1
[debug] classifier 7691 persisted
22MB of memory used
#20
[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 298 messages of which 86 are important
[debug] data set split into 269 training and 29 validation sets
[debug] classification report: {"recall":1,"precision":0.9655172413793104,"f1Score":0.9824561403508771}
[debug] classifier validated: recall(important)=1, precision(important)=0.96551724137931 f1(important)=0.98245614035088
[debug] classifier 7692 persisted
24MB of memory used
#7
[debug] found 5 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 0 messages of which 0 are important
[info] not enough messages to train a classifier
20MB of memory used
Thanks a lot @JensKillermann. For the accounts with not enough messages to train a classifier
there isn't much we can do. The ML needs a good amount of data to work reliably, hence there is a threshold of 20 messages minimum that are set as requirement for the ML training. Accounts with less than that will get some generic rules applied to detect some importance.
Account 15 seems to run into overfitting. As in, the classifier learns strong patterns on the input data. This can explain bad performance when that classifier is used to classify new messages.
Account 20 is also close to overfitting but I assume it works a tad more reliable.
The ratio of messages total to important messages might play a role.
I checked a few other production instances and it turns out that many accounts are in fact overfitting these days. Whoopsie and totally my bad, I should have paid more attention to how this develops. I'll see if I can reproduce it with an account on my development instance because debugging in production is everything but easy.
Hi - ended up here in search for the possibility to disable Priority Inbox. It has never had any content and I just have a spinner just above "Other" going on forever. For other users with less folders and less historical mail, the spinner disappears, but still only mail in the "Other" category. I have attached some examples on the requested debug log. although I dont know how to map the mailbox number to user, based on the number of inboxes I guess mine is one of the first shown. train.txt
Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.
Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.
You can disable the annoyance with v1.10. Have a good one :v:
@ChristophWurst : how do I find the account id ? I tried with the account inbox box id, but that didn't work
I went to PI, browsed the Other
category, and Mark some mails as important.
I marked some as unread so they could be in Important and unread
.
They do appear in I&U in a new private window.
They do not in the current tab, even after a full reload (Ctrl+F5).
If it's another issue, I can open a new one.
how do I find the account id ?
https://github.com/nextcloud/mail/blob/master/doc/admin.md#get-account-ids
Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.
You can disable the annoyance with v1.10. Have a good one ✌️
im trying to find the disable option but cant seem to find it. im on 1.10.5
Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.
You can disable the annoyance with v1.10. Have a good one v
im trying to find the disable option but cant seem to find it. im on 1.10.5
Settings in the bottom left > Automatically classify importance of new email
NC 24.0.1 / Mail 1.13.2
[debug] found 39 incoming mailbox(es) [debug] found 1 outgoing mailbox(es) [debug] found 1000 messages of which 9 are important [info] not enough messages to train a classifier 17MB of memory used
"Automatically classify importance of new email" is unchecked but half on the new mails are still randomly tagged as important !
How to disable automatic classification ?