Jimmy Lin comments

Results 251 comments of


                                            Jimmy Lin

Annotation methodology of QA resources

Hi @Timoeller - Thanks for your response. We've been working on building test collections also, but via slightly different approach: https://arxiv.org/abs/2004.11339 I was wondering if you'd be interested in more...

Annotation methodology of QA resources

What's your email? Or you can find mine on my website: https://cs.uwaterloo.ca/~jimmylin/index.html

Does BEIR support tuning config k and b of BM25 ?

You can find Anserini regressions for BEIR here: https://github.com/castorini/anserini#regressions-for-beir-v100 You can reuse the tuning script implemented here: https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md#bm25-tuning

Improve concurrency and space efficiency of parse_collection

Unless there are scientifically interesting questions you want to explore, I would advocate just giving up on indexing and letting Anserini/Lucene do it for you via CIFF. Why waste engineering...

Improve concurrency and space efficiency of parse_collection

Also, we're one issue away from the entire indexing pipeline in Anserini from being pip installable: https://github.com/castorini/pyserini/issues/77 Something like: ``` $ pip install pyserini ... $ python -m pyserini.index --collection...

Pretrained weights via model zoo

@wxp16 @richard3983 maybe you'd be interested in taking on?

Custom dataset error

The error message seems pretty informative - have you checked the length of your input samples?

Grab Semantic Scholar data for ACL papers

Oh nice! For example: https://api.semanticscholar.org/v1/paper/ACL:J07-1005 for https://www.aclweb.org/anthology/J07-1005/

Grab Semantic Scholar data for ACL papers

Even better, there's a dump: https://api.semanticscholar.org/corpus/download/

JsonVectorCollection weights are not obeyed for long terms

wow, what an obscure bug! How about we just drop all terms longer than 255 chars? They are unlikely to be meaningful anyway?