phoenix_pipeline
phoenix_pipeline copied to clipboard
Turning news into events since 2014.
Right now, the geocoding functions call back to Mongo to get the full text of a story for geocoding. This makes it very difficult to tests. Consider splitting out the...
Mordecai returns both the raw place name extracted from the text, as well as the gazetteer entry it matches that place name to. Right now, the pipeline only has a...
Think about adding a pre-pipeline coding step that geocodes complete articles (rather than sentences) to the country. This would be useful for two things: 1. Associating actors that don't have...
The variable 'StateName' in placeinfo dict is being used with out being assigned, apparently causing the exception and placeinfo is getting void.
David-Laxers-MacBook-Pro:eldiablo davidlaxer$ vagrant up /Applications/Vagrant/bin/../embedded/gems/gems/vagrant-1.6.3/lib/vagrant/pre-rubygems.rb:31: warning: Insecure world writable dir /Developer/NVIDIA/CUDA-6.5/bin in PATH, mode 040777 /Applications/Vagrant/embedded/gems/gems/bundler-1.6.2/lib/bundler/runtime.rb:222: warning: Insecure world writable dir /Developer/NVIDIA/CUDA-6.5/bin in PATH, mode 040777 Bringing machine 'default' up...
it seems as there is a bug which manifests in two recent CSV files, causing ultra long lines by repeating fields over and over. It seems, all affected lines have...
To cut down on noise in the geolocation, we could consider only geolocating material conflict events (or in any case not geolocating statements and verbal cooperation). ¯_(ツ)_/¯
Occasionally something like "USAGOVMILGOV" will get coded. We want only one of each code. This can be done pretty easily in postprocess.py A more sophisticated (and less pressing) improvement would...
Going through the Levant data I've seen a lot of e.g. "Talks begin _here_ today regarding....". The "here" is easy if you can see the dateline, but that's been lost...