Patrice Lopez issues

Results 77 issues of


Patrice Lopez

Train ELMo embeddings for French

For instance using French Wikipedia (1B words) + FrWac corpus (1.6B words) As a reference, it will require 2-3 GeForce GTX 1080Ti.

Rare class and batch size

For ELMo, which is using reduced batch size because of memory constrains, it might be necessary to review how the batch are created to ensure that rare classes are well...

Plug into GERBIL

Branch 0.0.3 contains a corpus-based evaluation together most of the usual NED corpora (ACE, AQUAINT, AIDA-CONLL, MSNBC, ...). However, it would be good to plug the tool on GERBIL for...

help wanted

Test and possibly migrate to lmdbjava

lmdbjava is apparently better maintained (more features & more OS built) and faster... also never get the zero copy mode working reliably with lmdbjni so it is worth trying lmdbjava...

enhancement

test needed

Support more small morphosyntactic variations

Wikipedia redirects and anchors cover most of the frequent morphosyntactic variants (e.g. plurial), but not in an exhaustive manner - we coud add a process (or pre-process) to support them.

enhancement

See the Java client written in anHALytics-core as starting point (the multithreaded version): https://github.com/anHALytics/anhalytics-core/blob/master/anhalytics-annotate/src/main/java/fr/inria/anhalytics/annotate/services/NerdService.java https://github.com/anHALytics/anhalytics-core/blob/master/anhalytics-annotate/src/main/java/fr/inria/anhalytics/annotate/Annotator.java

Patrice Lopez

Train ELMo embeddings for French

Rare class and batch size

Plug into GERBIL

Test and possibly migrate to lmdbjava

Support more small morphosyntactic variations

Create a Java web client

Slow loading of the Wikidata .bz2 dump

Experiment to support Wikidata label without Wikipedia usages

Make embedded db for embeddings more dynamic

Parallelize training resource generation in the training process