Jimmy Lin

Results 251 comments of Jimmy Lin

The final forceMerge is to merge all single index segments into a single one for better retrieval performance (this used to be the "optimize" method in earlier versions of Lucene)....

@yb1 Sorry for not being clear... your current implementation fetches the images from IA, which requires web access and thus is slow. We already have the images in our archives,...

This should be doable and relatively simple. We can modify build.xml to roll up all the individual jars into a big distribution jar. We can then store the jar identified...

https://twitter.com/zehavoc/status/999325064247087106 The mega-thread: http://pauillac.inria.fr/~seddah/May_thread.html

Unfortunately, I'm tied up with EMNLP deadlines until next week... but PR welcome?!

> I don't see other author names meshed together. Here's an example: https://aclanthology.org/2022.deeplo-1.14/ The mangled author profile: https://aclanthology.org/people/k/kelechi-ogueji-and-jimmy-lin/

As a sanity check, I went to the data directory and did this: ``` $ gunzip -c *.gz | grep -a "Content-Length" | cut -d ' ' -f 2 |...

@adamyy thanks for the update. What about a simple fix like adding a content length filter and skipping records above a certain threshold? We can define the threshold in a...

@adamyy Why don't we do that to mute the issue for now, so that it's not blocking the processing of existing collections? Send PR? @ruebot Thoughts? We can then close...

Having heard no follow-up, closing...