compromise icon indicating copy to clipboard operation
compromise copied to clipboard

value-phrase parsing support

Open Announcement opened this issue 8 years ago • 5 comments

Take this string for example pulled straight from https://en.wikipedia.org/wiki/Government_spending#Government_spending_per_country

In 2010 national governments spent an average of $2,376 per person, while the average for the world's 20 largest economies (in terms of GDP) was $16,110 per person. Norway and Sweden expended the most at $40,908 and $26,760 per capita respectively. The federal government of the United States spent $11,041 per person. Other large economy country spending figures include South Korea ($4,557), Brazil ($2,813), Russia ($2,458), China ($1,010), and India ($226).[8] The figures below, indicate 42% of GDP spending and a GDP per capita of $54,629, which suggests and total per person government spending of $22,726 in the U.S. c

nlp(paragraph).values()

It would be useful to be able to have context with your values somehow, not based on patterns but dynamically on descriptors.

For example, 42% what? 42% GDP Spending.

$2,376 is an average, and is per person. $16,110 is an average and also per person.

nlp(paragraph.sentences()

this is a sentence fragment

In 2010 nationa... per person,

this is another part of a sentence

while the average... per person.

this would also be useful information.

grand scheme

context trees or otherwise contextual analysis processing.

Announcement avatar Nov 09 '17 18:11 Announcement

hey Jacob, yeah neat idea. the values one, what sort of api/result format were you thinking of? "9 books" "9 litres of water" "9 degrees in the red car"

spencermountain avatar Nov 10 '17 00:11 spencermountain

basically that as the string but it would be nice to have it tagged like {value: 9, describes: 'book'} or {value: 9, describes: 'water', units: 'litres'}, but 9 degrees in the car would be quite odd without a context tree.

value: 9, units: 'degrees' (with possibly the implicit descriptor of tempature) [in] subject: 'car'.

Sorry I've been reading a lot of documents on language processing in other languages, and am really excited about all of the fun possibilities (y)

Announcement avatar Nov 10 '17 01:11 Announcement

So I was a bit worried that this wasn't possible, but turns out the Stanford parser does pretty much what i was looking for, with a bit finessing and data massaging. But it definitely provides the required data.

Announcement avatar Nov 28 '17 13:11 Announcement

nice! I like that result format, is the stanford one similar? if you poke-around the match-syntax stuff you may find all the parts are there, if you wanted to start pulling out 'describes' and 'units'. I'd merge a pr, even a rough one, that started doing that.

spencermountain avatar Nov 28 '17 16:11 spencermountain

no not at all, the stanford one gives you an ast tree that you have to parse by hand, but has dependencies attached and relationships and such.

but yeah the match stuff is ehhh

Announcement avatar Dec 09 '17 18:12 Announcement