Add documentation and unit tests to output from #6
The functions that @philip-schrodt wrote to extract verb and actor phrases from coded sentences need documentation and unit tests.
Something like this:
def get_phrases(text, parse):
parsed = utilities._format_parsed_str(parse)
ddict = {u'test123':
{u'sents': {u'0': {u'content': text, u'parsed': parsed}},
u'meta': {u'date': u'20010101'}}}
return_dict = petrarch2.do_coding(ddict, None)
nouns = return_dict['test123'][u'meta'][u'verbs'][u'nouns']
k = return_dict['test123'][u'meta'][u'verbs'].keys()
t = [i for i in k if i != 'nouns']
print t
if t:
verbs = [return_dict['test123'][u'meta'][u'verbs'][i][0] for i in t]
else:
verbs = ""
return {"nouns": nouns,
"verbs" : verbs}
t = "Airstrikes and artillery killed more than 60 people in the past 24 hours in Aleppo, including dozens at a hospital in a rebel-held neighborhood, as Syria's largest city was turned once again into a major battleground in the civil war, officials said Thursday."
p = "(ROOT (S (S (NP (NP (NNP Airstrikes)) (CC and) (NP (NN artillery))) (VP (VBD killed) (NP (QP (JJR more) (IN than) (CD 60)) (NNS people)) (PP (IN in) (NP (DT the) (JJ past) (CD 24) (NNS hours))) (PP (IN in) (NP (NNP Aleppo))) (, ,) (PP (VBG including) (NP (NP (NNS dozens)) (PP (IN at) (NP (NP (DT a) (NN hospital)) (PP (IN in) (NP (DT a) (JJ rebel-held) (NN neighborhood))))))) (, ,) (SBAR (IN as) (S (NP (NP (NNP Syria) (POS 's)) (JJS largest) (NN city)) (VP (VBD was) (VP (VBN turned) (ADVP (RB once) (RB again)) (PP (IN into) (NP (NP (DT a) (JJ major) (NN battleground)) (PP (IN in) (NP (DT the) (JJ civil) (NN war))))))))))) (, ,) (NP (NNS officials)) (VP (VBD said) (NP-TMP (NNP Thursday))) (. .)))"
get_phrases(t, p)
Sorry I haven't been tracking the internals of the code so far, but is get_phrases a function you're planning on adding in, or one that already exists?
get_phrases is just the thing I threw together to provide a one stop location for your verb and noun extraction needs, since I need them this week. It pulls things out of the meta dictionary updated by do_coding in @philip-schrodt's new version of the code. I'm not sure whether to sink lots of time into this or not, since @PTB-OEDA's people are working on a more robust approach.
For now, it's just getting used for our human coding tool and is going to live inside a container, being built here: https://github.com/ahalterman/night_ridir
Will have a CS Ph.D. student working on this after @PTB-OEDA people met today. Expect to see some serious NLP student time and expertise starting on these issues. Will also include some foreign language and universal dependencies additions.
On Thu, May 5, 2016 at 2:17 PM, Andy Halterman [email protected] wrote:
get_phrases is just the thing I threw together to provide a one stop location for your verb and noun extraction needs, since I need them this week. It pulls things out of the meta dictionary updated by do_coding in @philip-schrodt https://github.com/philip-schrodt's new version of the code. I'm not sure whether to sink lots of time into this or not, since @PTB-OEDA https://github.com/PTB-OEDA's people are working on a more robust approach.
For now, it's just getting used for our human coding tool and is going to live inside a container, being built here: https://github.com/ahalterman/night_ridir
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/openeventdata/petrarch2/issues/13#issuecomment-217250004
Patrick T. Brandt Professor Political Science School of Economic, Political and Policy Sciences University of Texas at Dallas Personal site: http://www.utdallas.edu/~pbrandt MSBVAR site: http://yule.utdallas.edu