exchangelib
exchangelib copied to clipboard
Unable to access in-place archive
Hi Erik, love the library and thank you for making your work openly available.
I'm having problems accessing the In-Place Archive on some accounts.
I'm trying to access several "Online Archive" enabled O365 accounts. The archive shows up in Outlook as "In-Place Archive" but may be different, I'm not exactly sure. We're programmatically moving large numbers of emails into the main folder of each account, and have an archiving policy that marks them for archiving after 24 hours. We're unable to reach the mail in the archive from Graph API, so we've used win32com + MAPI before and are now exploring EWS via exchangelib for manipulating messages within the archives.
We've followed the instructions in the documentation to set up an azure app with impersonation rights, and I'm able to connect to the account and access the folders from the root, but don't see or can't access any folders in the archive root. Oddly, when I check the number of children the archive root has, it shows 6, but walking or iterating over children yields no folders.
Here is my code for connecting to the account with app credentials:
from exchangelib import OAuth2Credentials, Account, Configuration
from exchangelib.version import Version, EXCHANGE_O365
credentials = OAuth2Credentials(
client_id=<client_id>,
client_secret=<client_secret>,
tenant_id=<tenant_id>,
)
config = Configuration(
credentials=credentials,
auth_type="OAuth 2.0",
version = Version(build=EXCHANGE_O365),
)
account = Account(<account@domain>, config=config, autodiscover=True)
I'm able to successfully refresh the root and archive_msg_folder_root.
child_folder_count is 62 for account.root, and 6 for account.archive_msg_folder_root.
print(account.root.tree()) shows everything expected, but print(account.archive_msg_folder_root.tree()) shows just "Top of Information Store". In Outlook there is an inbox with large numbers of nested subfolders within the "In-Place Archive".
Any help or a direction to investigate is much appreciated. Please let me know if you need logs or more information.
Wes
Hey, sorry for the late reply.
Just to rule that out, did you try accessing the 6 archive folders with an earlier version of exchangelib? It could be related to https://github.com/ecederstrand/exchangelib/issues/1222 Alternatively, you can see exactly what exchangelib is receiving and requesting if you enable debug logging.
I'm not very familiar with in-place archives, but it could be that the archive folders are placed in a different account than your primary account. In that case, Account.ad_response
may contain information on where to access the archive folders.
@rwludwig Did you get a chance to look at this again? exchangelib 5.2.0 contains a fix for #1222 so you could try that first.
Hi Erik,
Thank you for your response and the reminder!
I'll have a chance to try out your suggestions this week, and will report back here.
Much obliged, Wes
Hi Erik,
Updating to 5.2.0 has not resolved the issue, but logging and further exploration has revealed something interesting.
I'm still not able to find the folder structure I'm looking for from our in-place archive (notably, "Inbox" is missing from the archive_root tree), but I've found that all(?) the emails are stored in account.archive_root / "AllItems".
The messages in AllItems may only represent the first 100 GB before the online archive splits into another mailbox.
Do you have a suggestion of how I might discover what other mailboxes might be associated with this account via the auto-expanding archive? I couldn't make anything like that out from account.ad_response. I'll ask our IT admin to see if he can find the associated mailboxes and I'll comment here again if I learn anything.
Thank you for your time! Wes
Thanks for the update. I don't have any suggestions, unfortunately. I think your best bet is to work with your Exchange admin to work out where the remaining messages are located.
It seems the problem is that account.archive_msg_folder_root
is instantiated with root=Root
instead of root=ArchiveRoot
. I haven't quite worked out what's happening, but it appears to be in the folder resolution processing called from RootOfHierarchy.get_distinguished
. At some point in the folder resolution process, ArchiveRoot is swapped out for Root. Funny enough, I see that a few days ago you switched get_distinguished
to only take account
instead of root
because "just pass account here. We weren't using root anyway". Maybe that's part of the issue? I haven't tested to see if this affects other Archive well known folders.
Anyway, if we set account.archive_msg_folder_root._root = account.archive_root
before operating on the ArchiveMsgFolderRoot, then we get a proper looking tree with all the children instead of only Top of Information Store
as @rwludwig mentions in the initial report:
> print(account.archive_msg_folder_root.root.name)
root
> print(account.archive_msg_folder_root.tree())
Top of Information Store
> account.archive_msg_folder_root._root = account.archive_root
> print(account.archive_msg_folder_root.root.name)
archiveroot
> print(account.archive_msg_folder_root.tree())
Top of Information Store
├── Calendar
│ └── United States holidays
├── ExternalContacts
├── Files
├── Inbox
│ ├── AFolderOnlyInTheArchive
│ └── test folder
├── Outbox
├── PersonMetadata
├── RSS Subscriptions
├── Sent Items
├── Sync Issues
└── Tasks
Great find @geoffblack, thank you so much!
Manually setting account.archive_msg_folder_root._root = account.archive_root
shows me all the folders I expect to see in the archive inbox.
Can you try out the referenced commit? I believe it may fix the error.
@rwludwig I just tested https://github.com/ecederstrand/exchangelib/commit/d9035d03960797f9277381845e4ec98f837798ea on an account with a public archive and it seems to fix the error. Can you try it out?
That's great! I'll take some time this week and test it, thank you so much!
Changes are out in v5.3.0, BTW
Asuming this is fixed. Feel free to update here if not.