Patrice Lopez

Results 77 issues of Patrice Lopez

For instance using French Wikipedia (1B words) + FrWac corpus (1.6B words) As a reference, it will require 2-3 GeForce GTX 1080Ti.

For ELMo, which is using reduced batch size because of memory constrains, it might be necessary to review how the batch are created to ensure that rare classes are well...

Branch 0.0.3 contains a corpus-based evaluation together most of the usual NED corpora (ACE, AQUAINT, AIDA-CONLL, MSNBC, ...). However, it would be good to plug the tool on GERBIL for...

help wanted

lmdbjava is apparently better maintained (more features & more OS built) and faster... also never get the zero copy mode working reliably with lmdbjni so it is worth trying lmdbjava...

enhancement
test needed

Wikipedia redirects and anchors cover most of the frequent morphosyntactic variants (e.g. plurial), but not in an exhaustive manner - we coud add a process (or pre-process) to support them.

enhancement

See the Java client written in anHALytics-core as starting point (the multithreaded version): https://github.com/anHALytics/anhalytics-core/blob/master/anhalytics-annotate/src/main/java/fr/inria/anhalytics/annotate/services/NerdService.java https://github.com/anHALytics/anhalytics-core/blob/master/anhalytics-annotate/src/main/java/fr/inria/anhalytics/annotate/Annotator.java

The Wikidata dump became very big with 1.2 billion statements which makes the initial loading of the bz2 dump into lmdb particularly slow. To speed-up this step, we could try:...

enhancement

Some disambiguation fails for terms present in Wikidata (as label) because there are no usage information in the wikipedia of this target language. The difficulty is that without any statistical...

enhancement

It would be good to have the lmdb map a bit more dynamic to select the right weight encoding based on the actual value range, so that the mechanism can...

enhancement

The production of the training data is currently single threaded and really slow for the selection model. A straightforward improvement is to use several workers for this, as this task...