value-phrase parsing support
Take this string for example pulled straight from https://en.wikipedia.org/wiki/Government_spending#Government_spending_per_country
In 2010 national governments spent an average of $2,376 per person, while the average for the world's 20 largest economies (in terms of GDP) was $16,110 per person. Norway and Sweden expended the most at $40,908 and $26,760 per capita respectively. The federal government of the United States spent $11,041 per person. Other large economy country spending figures include South Korea ($4,557), Brazil ($2,813), Russia ($2,458), China ($1,010), and India ($226).[8] The figures below, indicate 42% of GDP spending and a GDP per capita of $54,629, which suggests and total per person government spending of $22,726 in the U.S. c
nlp(paragraph).values()
It would be useful to be able to have context with your values somehow, not based on patterns but dynamically on descriptors.
For example, 42% what? 42% GDP Spending.
$2,376 is an average, and is per person. $16,110 is an average and also per person.
nlp(paragraph.sentences()
this is a sentence fragment
In 2010 nationa... per person,
this is another part of a sentence
while the average... per person.
this would also be useful information.
grand scheme
context trees or otherwise contextual analysis processing.
hey Jacob, yeah neat idea. the values one, what sort of api/result format were you thinking of?
"9 books" "9 litres of water" "9 degrees in the red car"
basically that as the string but it would be nice to have it tagged like {value: 9, describes: 'book'} or {value: 9, describes: 'water', units: 'litres'}, but 9 degrees in the car would be quite odd without a context tree.
value: 9, units: 'degrees' (with possibly the implicit descriptor of tempature) [in] subject: 'car'.
Sorry I've been reading a lot of documents on language processing in other languages, and am really excited about all of the fun possibilities (y)
So I was a bit worried that this wasn't possible, but turns out the Stanford parser does pretty much what i was looking for, with a bit finessing and data massaging. But it definitely provides the required data.
nice! I like that result format, is the stanford one similar? if you poke-around the match-syntax stuff you may find all the parts are there, if you wanted to start pulling out 'describes' and 'units'. I'd merge a pr, even a rough one, that started doing that.
no not at all, the stanford one gives you an ast tree that you have to parse by hand, but has dependencies attached and relationships and such.
but yeah the match stuff is ehhh