Nick Crews
Nick Crews
Thanks Tom. Glad that someone found it useful! Depending on what you or Forest or others think about how universally valuable this is, perhaps we add support for this sort...
I don't remember exactly now ðŸ˜, should have been more detailed. But I think it was constructing a DF from a pyarrow timestamp array, doing some operations on it, and...
@ljmartin The size of the row groups *definitely* affected the performance for me, but that was a more extreme case when my ~100 million rows had row groups in the...
I'm experiencing this too. There aren't any tests for domains I assume. I started a PR #1115 as an attempt at a fix. Note that it's not just a problem...
Totally drive-by-ing here, so I can't help much more than this, but the short term fix is to downgrade `doit` to a version below `0.36.0` (version [0.35.0](https://pypi.org/project/doit/#history) looks like it...
Added a PR above that *MIGHT* be a more permanent fix.
Pointing out here for inspiration that [splink](https://github.com/moj-analytical-services/splink) does pretty much everything in SQL. It even is backend-agnostic, so it can run locally with DuckDB, or distributed on spark
> Perhaps it is worth bumping jellyfish to a more recent version? As a clarification, since this is a library, I believe that this shouldn't be pinned to a specific...
I think this is actually an upstream issue with jellyfish: https://github.com/jamesturk/jellyfish/issues/160 but updating to jellyfish>=0.9.0 DID fix it for me. So I'm not quite sure what version of jellyfish /...
Driveby-ing here, but perhaps you could look at https://github.com/burnash/gspread and see what they do. These two projects seem to be very related, perhaps you could share the load and come...