Jérôme Dockès
Jérôme Dockès
> Also, the TableReport returns an error when the dataframe contains a list: thanks for reporting it. it is because the unique value counts are stored in a dict and...
I also wonder about pickling. maybe we should not store the model in an attribute but only store the name, and reload the model as necessary? or if we are...
> cache it in an attribute that does not get pickled this can be done by defining `__getstate__` as shown [here](https://docs.python.org/3/library/pickle.html#handling-stateful-objects)
here is a toy example with `functools.cached_property` `encoder.py` ```python import functools class Encoder: def __init__(self, name): self.name = name def transform(self, x): return self._estimator(x) @functools.cached_property def _estimator(self): print(f'loading {self.name}') return...
> TextEmbedding is nice, the only issue is vectors and embeddings are close concepts, so people might not understand its difference from other string-based encoders like MinHash. What about LLMEncoder?...
> Downloading the weights might be a bit slow though. we can use the circleci cache so that downloading models does not happen often (if/when that turns out to be...
> Note that loading using names wouldn't work for local models in another context. yes that's true, I was assumign the main use-case would be to use a pre-trained model...
> I don't see how this could help with pickling, however. @jeromedockes WDYT? Indeed my question was not really about memory usage or sharing state between objects but rather whether...
screenshot from the rendered example :sweat_smile: : 
as @Vincent-Maladiere says `order_by` is something that what kept from early versions of the tablereport (skrubview at the time) for time series. as it's not really documented I doubt anyone...