Marek Šuppa
Marek Šuppa
@imenelydiaker unfortunately, it's still running. With `sentence-t5-xxl` some of the datsets take days to evaluate (e.g. FEVER or MSMARCO). I am running this at a single H100 and didn't really...
@imenelydiaker I am on it :)
@KennethEnevoldsen ugh, is there a chance just running the code in https://github.com/embeddings-benchmark/mteb/commit/4e1bab4ef964291aadacf808465d54e93d3db4cc will save wrong data?
Thanks @KennethEnevoldsen. The processing has failed with the following anyhow: ``` INFO:mteb.evaluation.evaluators.RetrievalEvaluator:For evaluation, we ignore identical query and document ids (default), please explicitly set ``ignore_identical_ids=False`` to ignore this. ERROR:mteb.evaluation.MTEB:Error while...
Would it make sense to merge your changes in and re-run the whole pipeline again?
@KennethEnevoldsen that's the tough part -- this is running on a very old branch intentionally (the code in https://github.com/embeddings-benchmark/mteb/commit/4e1bab4ef964291aadacf808465d54e93d3db4cc builds off of the `1.2.0` release), to show the differences between...
Thanks @Muennighoff and @KennethEnevoldsen. Unfortunately, `pytrec_eval` hasn't been updated in the past 3 years (https://pypi.org/project/pytrec-eval/) so it doesn't seem like it will be that. I'll try the smaller model just...
The `sentence-t5-xxl` processing has finally finished -- the result can be seen in https://github.com/embeddings-benchmark/mteb/compare/main...mrshu:mteb:mrshu/port-carbon-emissions-estimation
Thanks for the report and your patience @DimitriSam! This is the first time, however, that I see a list of column names in the definition of tests -- could you...
We'd be totally for it @sdebruyn! I guess in order for that to happen we'd first need to support the format these "coverage-reporting-as-a-service" tools (like https://coveralls.io/) actually support (c.f. #39)...