Sean MacAvaney

Results 224 comments of Sean MacAvaney

This issue relates to #163 as well

Would be nice if there was an iterator version as well to avoid dataframes when indexing. Maybe add an optional transform_iter() function to the transformer spec?

I'd love if there was a way we could fit sentence segmentation into this as well. Splitting mid-sentence isn't ideal, and since [most models are pretty sensitive to surface-level features...

This sounds like a good addition, and I'm in favor of adding a `dataset.qrels.binary_relevance_cutoff()` function (or similar). Especially considering how frequently this causes folks problems. The current solution is to...

still to do: documentation

Awesome! Given LongEval's focus on the temporal, I think it should be encoded at a higher level in the dataset ids, e.g.: - `longeval` (plaeholder) - `/[2023-07|2023-09|...]` (placeholder) - `/[en|fr|...]`...

That would be awesome! I love when folks release data in standard formats :-)

Great, thanks! I won't have much time to contribute directly to this for the next week or so. Regarding the tasks: > WAPO in v3 (2021) has no clearly associated...

Given that 3.7's reached End of Service, I think it's reasonable to bump the minimum Python version to 3.8. Especially if there's features from the core library that you want...

The generic classes can move outside the cast directory. I'm keen to apply them in other settings.