Sean MacAvaney
Sean MacAvaney
Thanks for reporting! I’ll look into it.
Wow-- thanks! Seems to be coming along nicely. The vdom structure is a bit complicated, but I guess it needs to be in order to properly represent the data.
Awesome, thanks! A few other nits: - Looks like 'cached_property' isn't supported in python 3.7 or below - It seems something requires that the clueweb22 exists, which we cannot assume....
Thinking a bit more about this... I sorta feel that the primary case will be the `_Txt` version. Might it make sense to have the alternate formats as separate datasets,...
I see your points and I think I agree with some of them. I could probably be convinced. However, let me make a more complete case in favor of a...
Thanks! Looks like there are still some py36 incompatibilities: `ImportError: cannot import name 'Final' from 'typing'`. My main hesitation remains that in my experience so far with the package, it...
Awesome, thanks!
Maybe I'd feel a bit more comfortable if we had some performance benchmarks. E.g., how fast is it to iterate the first 100k documents for the combined vs text-only versions?
Great news, my copy of the CW22 drive arrived.
Sorry -- the only thing blocking is finding the time to run through the tests on my end.