frog
frog copied to clipboard
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Ik heb een corpus met 25 miljard woorden die ik wil 'froggen', daarvoor heb ik een 32 core/128GB RAM computer. M'n plan is om 16 losse instanties van frog te...
In frog there is quite a messy way of handling options and configuration details. We have FrogOptions and a TiCC::Configuration part to store information. This could be simplified a lot...
When resolving MWU's (in frog_data::resolve_mwus() ) the **deep_morphs** structure is lost; only the **deep_morph_string** member is resolved. This is disadvantageous, as it is impossible to retrieve the separate deep_morphs and...
The 'tabbed' format is quite rigid, and sometimes difficult to read. (especially when some modules are skipped). It might be handy to create JSON output as an alternative. This might...
in `--deep_morph` mode, MBMA can detect al kinds of compounds, and even outputs them. it would be very useful if we could add some code to give the logical splitting...
Frog now assigns provenance data to FoLiA, which a.o. allows us to detect a rerun of (parts of) Frog on a FoLiA documents. BUT: Handling this is quite dangerous and...
It might be interesting to map our CGN PoS tags to universal pos tags, a pos tag vocabulary from the universal dependencies project that is in more widespread use (but...
In mblem.lex staan erg veel verdachte lemma's Bijvoorbeeld `wezen` voor het WW `zijn`: ``` 1 wezen wees N(soort,mv,basis) 2 wezen wezen N(soort,ev,basis,onz,stan) 3 wezen wezen WW(inf,nom,zonder,zonder-n) 4 wezen wezen WW(inf,prenom,zonder)...
Consider this example: ```xml nld test twee test aha Een brief voor de koning. ``` At the moment Frog will _ignore_ the two words in the paragraph and only handle...
Hi and thanks for the great work! Can i check if there's a documentation on what datasets were used for training each specific module of Frog (eg. pos tagger/dependency parser)?...