Jérôme Dockès comments

Results 396 comments of


                                            Jérôme Dockès

Misc improvements of table report

> Also, the TableReport returns an error when the dataframe contains a list: thanks for reporting it. it is because the unique value counts are stored in a dict and...

FEA Add `TextEncoder`

I also wonder about pickling. maybe we should not store the model in an attribute but only store the name, and reload the model as necessary? or if we are...

FEA Add `TextEncoder`

> cache it in an attribute that does not get pickled this can be done by defining `__getstate__` as shown [here](https://docs.python.org/3/library/pickle.html#handling-stateful-objects)

here is a toy example with `functools.cached_property` `encoder.py` ```python import functools class Encoder: def __init__(self, name): self.name = name def transform(self, x): return self._estimator(x) @functools.cached_property def _estimator(self): print(f'loading {self.name}') return...

FEA Add `TextEncoder`

> TextEmbedding is nice, the only issue is vectors and embeddings are close concepts, so people might not understand its difference from other string-based encoders like MinHash. What about LLMEncoder?...

FEA Add `TextEncoder`

> Downloading the weights might be a bit slow though. we can use the circleci cache so that downloading models does not happen often (if/when that turns out to be...

FEA Add `TextEncoder`

> Note that loading using names wouldn't work for local models in another context. yes that's true, I was assumign the main use-case would be to use a pre-trained model...

FEA Add `TextEncoder`

> I don't see how this could help with pickling, however. @jeromedockes WDYT? Indeed my question was not really about memory usage or sharing state between objects but rather whether...

FEA Add `TextEncoder`

screenshot from the rendered example :sweat_smile: : ![screenshot_2024-10-22T12:48:21+02:00](https://github.com/user-attachments/assets/dd4c5c25-29ec-4f94-99ca-ca2c0c2cc3c3)

Improve the functionality of the TableReport plots when `order_by` is set

as @Vincent-Maladiere says `order_by` is something that what kept from early versions of the tablereport (skrubview at the time) for time series. as it's not really documented I doubt anyone...