Import emails from an inbox properly
I would like to try creating a knowledge graph out of my email inbox. All raw emails are stored in AWS Workmail as well as a S3 bucket in the raw form. However, the problem is that these emails are not really pure text files which are ready to ingest into a knowledge graph. Many of them have attached documents which should be processed as well. I could try splitting the email body from attachments and store in a separate bucket, but that would lose the information about which file was attached with which email.
It would be great if this tool could directly import emails from an Inbox properly: either over POP/IMAP or S3 bucket with the expectation that it contains raw email messages.
I think this problem is conceptually similar to ingesting Zip files which contain multiple related documents. A naive unzipping approach before ingestion would lose those relations which we want to avoid.
Does anyone have pointers to reading material about building a knowledge graph out of email inbox?
@jexp