natural
natural copied to clipboard
Interested in CYK and Earley chart parsers?
I implemented CYK and Earley chart parsers in Node. Interested in including it in the natural package? Please let me know.
Regards, Hugo
Hugo,
I've been really interested in parsers lately (working on something like an LR(1) parser myself right now), I'd love to check them out and we can talk about the best way to integrate them into natural.
-Ken
Hi Ken,
I created a repository with the chart parsers: https://github.com/Hugo-ter-Doest/chart_parsers The parsers are in /routes. I built a web app around it for testing and demonstration purposes. I think we/I can strip the parsers down to the algorithms and integrate it into natural. Also test code and documentation needs to be added.
Regards, Hugo
Hugo,
Sorry for the delay on this, it looks pretty solid. I think if you could isolate the parsers and stand some tests up around them we could definitely merge them in.
-Ken
Here is a sign of life... I did a lot of work on the parsers:
- Parsers are now working with the same Chart, Item and Grammar objects.
- The data structure of items follows the format for InfoVis so that parse trees can be visualised easily.
- Grammar object is based on a PEG using http://pegjs.majda.cz/. It reads unification grammars as well. That's for later when I add feature structures and unification.
- Added Left Corner parser.
- Earley and Left Corner inherit from a generic Chart Parser.
- Turned the recognisers in parsers: they record the parse now by keeping track of the children used to recognise an item (a rule).
Right now I'm working on tests using the Jasmin framework. At first I started writing tests using assert, but I saw that natural uses Jasmin.
Maybe we should look at the interfacing between existing modules in the natural module and the parsers. At the moment the parsers accept an a tagged sentence of the form: [['I', 'NP'], ['saw', 'V'], ['the', 'DET'], ['man', 'N'], ['with', 'P'], ['the', 'DET'], ['telescope', 'N']]
Best regards, Hugo
Hugo,
Looks really exciting! I'll take a closer look tonight if i can. To be honest i'll have to quickly look through how our parsers work but it sounds like the tagged sentence makes sense.
-Ken
Hugo,
Sorry for the delay on this, its still on my list just been traveling for work the past week so i've been a little distracted. Going to pull everything down now, look at our parsers and see what makes sense.
-Ken
Tests are in the spec folder. ChartParser_spec will fail because I'm using it to test the Head-Corner parser which is still under development. If you switch these lines: //[LeftCornerParser, EarleyParser].forEach(function(ChartParser) { [HeadCornerParser].forEach(function(ChartParser) {
the Left-Corner and Earley parsers will be tested.
Regards, Hugo
@Hugo-ter-Doest, looking over some of the stuff today. I'm trying to think of what the best fit with our existing code will be.
It seems to me if the parser requires the sentence to be tagged, for now one would have to use the wordnet module since we don't have a working POS tagger currently, it would be cool to have an example of that in the docs.
Something seems wrong in the unit tests, i tried changing a few values in an effort to break them and they didn't break. I'll work on them a bit and see what I can come up with.
I will take a look at the Wordnet module to see how I can connect it to the parsers.
Regarding the unit tests I cannot see what is going on. In my environment they fail if I change expected values in the spec files. For instance, if you exchange the indices of parse_trees in lines 71/72 of ChartParser_spec.js, the test will/should fail. If you let me know what happens I will look into this.
Regards, Hugo
Ok i'll take a look, i was just running through it quickly maybe i missed something. Also, re: wordnet, I dont think it needs to be integrated with the parsers but maybe an example of taking a string tokenizing it, looking it up in wordnet and producing a list of tagged tokens to pass to the parser. I think it'll be a useful example for people who want to use the parsers in the future.
Thanks!
On Thu, Oct 23, 2014 at 4:07 PM, Hugo ter Doest [email protected] wrote:
I will take a look at the Wordnet module to see how I can connect it to the parsers.
Regarding the unit tests I cannot see what is going on. In my environment they fail if I change expected values in the spec files. For instance, if you exchange the indices of parse_trees in lines 71/72 of ChartParser_spec.js, the test will/should fail. If you let me know what happens I will look into this.
Regards, Hugo
— Reply to this email directly or view it on GitHub https://github.com/NaturalNode/natural/issues/183#issuecomment-60300602.
Regarding Wordnet, I will create an example.
Unit tests: it's good idea to pull the latest code because I'm working on it on a daily basis.
Hugo
2014-10-23 22:09 GMT+02:00 Ken Koch [email protected]:
Ok i'll take a look, i was just running through it quickly maybe i missed something. Also, re: wordnet, I dont think it needs to be integrated with the parsers but maybe an example of taking a string tokenizing it, looking it up in wordnet and producing a list of tagged tokens to pass to the parser. I think it'll be a useful example for people who want to use the parsers in the future.
Thanks!
On Thu, Oct 23, 2014 at 4:07 PM, Hugo ter Doest [email protected]
wrote:
I will take a look at the Wordnet module to see how I can connect it to the parsers.
Regarding the unit tests I cannot see what is going on. In my environment they fail if I change expected values in the spec files. For instance, if you exchange the indices of parse_trees in lines 71/72 of ChartParser_spec.js, the test will/should fail. If you let me know what happens I will look into this.
Regards, Hugo
— Reply to this email directly or view it on GitHub https://github.com/NaturalNode/natural/issues/183#issuecomment-60300602.
— Reply to this email directly or view it on GitHub https://github.com/NaturalNode/natural/issues/183#issuecomment-60300925.
Met vriendelijke groeten, Hugo ter Doest 06-23 43 60 33
Yea i think it was my mistake on the unit tests sorry about that was just playing with them and everything seems fine.
Would you be able to outline for me just the files we would need to include the parsers in natural as well as what new external dependencies we would have to add? That way i know exactly what I need to do to integrate them.
Thanks!
External dependencies are: "lodash" : "", "fs" : "", "log4js": "", "typeof": ""
plus jasmine-node for testing.
Files you need are: from lib: Agenda.js ChartParser.js CYKParser.js EarleyItem.js GrammarParser.js LeftCornerParser.js Chart.js CYK_Item.js DoubleDottedItem.js EarleyParser.js GoalItem.js HeadCornerParser.js PEG-grammar-for-unification-grammar.txt
And from spec: ChartParser_spec.js Chart_spec.js CYKParser_spec.js EarleyItem_spec.js GrammarParser_spec.js HeadCornerParser_spec.js
Hugo
I forgot to mention the data files for the unit tests. From data you need: math_expressions.txt test_grammar_for_CYK.txt minimal_grammar.txt test_grammar_for_CFG.txt
Did some work on the example with Wordnet. I found out that Wordnet supports the following POS tags: n NOUN v VERB a ADJECTIVE s ADJECTIVE SATELLITE r ADVERB
which is quite limited for full parsing. I will think of an example that makes some sense. In the mean time you can check the example. It is in the example folder (where else :-) of chart_parsers.
Also, I made the parsers work with a tagged sentence that may have multiple tags per token, like this: [["I","s","n"],["saw","v","n"],["the","unknown"],["man","v","n"],["with","unknown"],["the","unknown"],["telescope","v","n"]]
Hugo
Ok cool, was just looking for some end to end kind of example but i agree wordnet may not be the best. I think a better method for POS tagging is definitely a high priority for me as soon as i get some time to work on it.
I did something else to complement the Wordnet tags: I wrote a module FunctionWordTagger that reads a bunch of files with function words and tags the rest of the sentence. The module is in the lib folder, the dictionary files are in the data folder.
Hugo
Hugo,
Sorry again for the slowness on this been a crazy few weeks at work, can you just let me know if there are any lodash features you use that arent included in underscore? I don't really want to have both dependencies since theyre pretty interchangable.
I think eventually we could port natural over to lodash but for now i just want to find the quickest path to getting the parsers integrated.
I just created a branch for this so its at the top of my list hopefully by the end of the weekend i'll have made some serious progress.
-Ken
I will take a look at the underscore features. If I remember right, I use lodash for deep comparison of objects only. I checked and this is supported by underscore as well. So it's probably no problem to exchange them.
Hugo
Oh and there's no hurry. I'm writing this stuff just for fun and to learn a new language and libraries.
I replaced lodash with underscore!
Hugo
Perfect! i was playing around with it and replaced it in a few places as well so i had a feeling it would have worked.
This looks amazing. We are still missing a CCG or minimalist grammar. But that is a great list.
Thanks! Plan is to integrate this into the natural module.
Regards, Hugo
I published the chart parsers on npm: npm install chart-parsers
Regards, Hugo