indivisible icon indicating copy to clipboard operation
indivisible copied to clipboard

Investigate open data sets

Open pghosh opened this issue 7 years ago • 3 comments

This task is to investigate different open data sets to find models /train models that can identify action details from given text

pghosh avatar Mar 08 '17 05:03 pghosh

Hello! Are there particular datasets we should be starting with? I'd like to help if I can.

restrellado avatar Mar 09 '17 03:03 restrellado

tl;dr: No, this task is to see if there is an open data set/model exists Details, We have a few (like 100/200) emails that we need to start working with. Clearly that is not enough data to train any model. The goal for this task is to see if there are trained models available or find dataset that can be helpful to train models which can identify event details from a paragraph. end goal is inline with what an intelligent calendar does when you want to add a meeting/reminder/event details from a simple english sentence. I have a separate task to experiment with spacy on the emails we have . This task is for finding a open model or data set that can be leveraged to solve the problem. You can alos ping me in slack @pg if there is more questions!

pghosh avatar Mar 09 '17 17:03 pghosh

Have you considered using the Enron email dataset? https://www.cs.cmu.edu/~./enron/

Taking a look at it, I'm finding snippets such as:

PS: Colleen is setting up a meeting tomorrow to discuss the direction for transport. Hopefully we'll know much better where that part stands at that

davidmudrauskas avatar Oct 16 '17 01:10 davidmudrauskas