Sean MacAvaney

Results 224 comments of Sean MacAvaney

Thanks for reporting! I’ll look into it.

Wow-- thanks! Seems to be coming along nicely. The vdom structure is a bit complicated, but I guess it needs to be in order to properly represent the data.

Awesome, thanks! A few other nits: - Looks like 'cached_property' isn't supported in python 3.7 or below - It seems something requires that the clueweb22 exists, which we cannot assume....

Thinking a bit more about this... I sorta feel that the primary case will be the `_Txt` version. Might it make sense to have the alternate formats as separate datasets,...

I see your points and I think I agree with some of them. I could probably be convinced. However, let me make a more complete case in favor of a...

Thanks! Looks like there are still some py36 incompatibilities: `ImportError: cannot import name 'Final' from 'typing'`. My main hesitation remains that in my experience so far with the package, it...

Awesome, thanks!

Maybe I'd feel a bit more comfortable if we had some performance benchmarks. E.g., how fast is it to iterate the first 100k documents for the combined vs text-only versions?

Great news, my copy of the CW22 drive arrived.

Sorry -- the only thing blocking is finding the time to run through the tests on my end.