Results 396 comments of Jérôme Dockès

> Also, the TableReport returns an error when the dataframe contains a list: thanks for reporting it. it is because the unique value counts are stored in a dict and...

I also wonder about pickling. maybe we should not store the model in an attribute but only store the name, and reload the model as necessary? or if we are...

> cache it in an attribute that does not get pickled this can be done by defining `__getstate__` as shown [here](https://docs.python.org/3/library/pickle.html#handling-stateful-objects)

here is a toy example with `functools.cached_property` `encoder.py` ```python import functools class Encoder: def __init__(self, name): self.name = name def transform(self, x): return self._estimator(x) @functools.cached_property def _estimator(self): print(f'loading {self.name}') return...

> TextEmbedding is nice, the only issue is vectors and embeddings are close concepts, so people might not understand its difference from other string-based encoders like MinHash. What about LLMEncoder?...

> Downloading the weights might be a bit slow though. we can use the circleci cache so that downloading models does not happen often (if/when that turns out to be...

> Note that loading using names wouldn't work for local models in another context. yes that's true, I was assumign the main use-case would be to use a pre-trained model...

> I don't see how this could help with pickling, however. @jeromedockes WDYT? Indeed my question was not really about memory usage or sharing state between objects but rather whether...

screenshot from the rendered example :sweat_smile: : ![screenshot_2024-10-22T12:48:21+02:00](https://github.com/user-attachments/assets/dd4c5c25-29ec-4f94-99ca-ca2c0c2cc3c3)

as @Vincent-Maladiere says `order_by` is something that what kept from early versions of the tablereport (skrubview at the time) for time series. as it's not really documented I doubt anyone...