UniversalPetrarch
UniversalPetrarch copied to clipboard
debugging UP through dictionaries
I have been working on debugging issues with UniversalPetrarch, mainly the issue of matching the dictionaries and the extracted patterns. @ahalterman (and @philip-schrodt ) suggested a way of doing so by tracking how does UP produce events through outputting the dictionary verbs and verb-patterns it matched. This method was used in debugging Petrarch2, and here is the relavent code snippets that does it (by @ahalterman )
Here's the code block I added to PETR2: this is the version of PETR2 with a file date of 28 June 2016
t1 = time.time()
sentence = PETRtree.Sentence(treestr,SentenceText,Date)
print(sentence.txt)
coded_events , meta = sentence.get_events() # this is the entry point into the processing in PETRtree
# =========== new code starts here =======
for k1, v1 in meta.items():
if k1 != 'nouns' and k1 != 'conv_code':
fwmp.write("\n" + str(k1) + '\n')
fwmp.write(SentenceID + '\n')
try:
fwmp.write(sentence.txt + '\n')
except:
fwmp.write("Sentence error\n")
for lst in v1:
# -- fwmp.write("++ " + str(lst))
if "~" in lst:
fwmp.write("-- " + lst)
elif len(lst) > 1:
if "[" in lst[1]:
fwmp.write("-- " + lst[0] + ": " + lst[1][:lst[1].find("[")].strip() + '\n')
else:
fwmp.write("-- " + lst[0] + ": " + str(lst[1:]) + '\n')
else:
if lst[0]: fwmp.write("-- " + lst[0] + '\n')
"""if "conv_code" in meta:
fwmp.write(meta["conv_code"])""" # used to figure out convert_code, which seems to be pretty innocuous
if "comb_code" in sentence.metadata:
fwmp.write(sentence.metadata["comb_code"])
# ===== new code ends here =========
code_time = time.time()-t1
if PETRglobals.NullVerbs or PETRglobals.NullActors:
event_dict[key]['meta'] = meta
event_dict[key]['text'] = sentence.txt (edited)
"fwmp" is the file where the patterns are written to, so it is open and closed elsewhere in the code
This code block is in "petrarch2.py"
I am having issues fitting this code to UP since it uses PETRgraph
and it does not return a meta
object. I would appreciate any help of how to tackle this.
The sentence object in PETRgraph.py has an entry triplets that can be used for debugging.
Here is an example of sentence: The Syrian Observatory for Human Rights, a UK-based group that tracks the war, said eight people were killed in an air strike by government forces in a separate, rebel-held part of the city.
{'-#18#20#4': -->triplet_ ID {'transfermation': '~ a (b . ATTACK) SAY = a b 112\n', -->Transformation pattern matched if any 'meaning': 'KILL,KILL', --> block meaning 'verbcode': '190', 'triple': ('-', <PETRgraph.NounPhrase instance at 0x7f47fd9dc128>, <PETRgraph.VerbPhrase instance at 0x7f47fd9dacb0>), 'before_transfer': ([u'SYR'], ([u'---MIL'], [u'---PPL'], '190'), '010'), --> events involved in tranformation 'after_transfer': [([u'SYR'], [u'---MIL'], u'112')] -->event after transformation 'event': ([u'---MIL'], [u'---PPL'], '190'), -->event or event before transformation 'matched_txt': u'KILL'}, -->matched verb pattern or block meaning if only verb is matched }