docling icon indicating copy to clipboard operation
docling copied to clipboard

Standardized Access to Common Email and Calendar Formats

Open ByteMeFree opened this issue 1 year ago • 6 comments

Requested feature The feature I propose is to make common email formats (such as .msg, .eml, and calendar files like .ics) readily available for users. This feature addresses the need for users to easily access and manage their email and meeting data in a standardized format. It should clearly display essential information, including the sender, recipient, CC, BCC, date, body, signature, attachments, and mail history for emails. For calendar files, it should include similar details such as meeting participants, date, time, agenda, and any attachments related to the meeting.

Alternatives I have considered the following Manual Export: Users can manually export emails and calendar events, but this process is time-consuming and prone to errors.

ByteMeFree avatar Nov 13 '24 09:11 ByteMeFree

@ByteMeFree This would entail to add dedicated backends in the library. Do you have any good libraries for parsing these formats?

PeterStaar-IBM avatar Nov 13 '24 10:11 PeterStaar-IBM

import extract_msg msg = extract_msg.Message(file_path) https://github.com/TeamMsgExtractor/msg-extractor?tab=readme-ov-file

from ics import Calendar https://github.com/ics-py/ics-py

ByteMeFree avatar Nov 13 '24 12:11 ByteMeFree

@ByteMeFree We do no accept any libraries with viral licensing (eg GPL), in order to maintain our MIT Licence.

PeterStaar-IBM avatar Nov 13 '24 12:11 PeterStaar-IBM

Ah, I see... Makes sense.

ByteMeFree avatar Nov 13 '24 12:11 ByteMeFree

I think this should be possible, right? from ics import Calendar https://github.com/ics-py/ics-py

ByteMeFree avatar Nov 14 '24 14:11 ByteMeFree

Would this be usable for EML formats?

https://pypi.org/project/emlmailreader

acseven avatar Nov 05 '25 22:11 acseven