TopicNet
TopicNet copied to clipboard
Model's _fit should accept Dataset also, not just BatchVectorizer
Seems more natural for a model to fit on Dataset.
Maybe better to use Union[artm.BatchVectorizer, topicnet.cooking_machine.Dataset]
instead of just artm.BatchVectorizer
(Union — for compatibility)?
off-topic (although not quite): BaseModel has TODO in _fit
's docstring for dataset_trainable
Did you mean Union
instead of Tuple
? Or am I confused about OR operator in typing
?
Exactly! The owls are not what they seem. Corrected!
First, _fit
is "protected" method, meaning we do not guarantee that it should work nice and easy for the user and that everything will work. Meaning, that normally user should not use it to train a model and it exists so we can hook up library components with this method.
Given that we go forward and implement this enhancement we will have to change some of the core architecture: making method "legal" to use makes it so that we have to 1) add a cube information to the fit 2) check that the fit is not overlapping with previous actions 3) train model in a separate thread and save/load it afterwards...
See where it's going? the nice and simple method grows into something that duplicates existing functionality and puts it into the "models" class that we already wanted to "separate" from the training action.
I think you are moving the goalposts. We do not provide guarantees on _fit
, but it does not forbid the user to use it. Making this method a bit more flexible does not change that.
Also, training a model without Cubes + Experiment overhead is exactly why one would consider using the method (e.g. for very dirty prototyping or perhaps for cases not covered by Cubes + Experiment yet).
First, _fit is "protected" method, meaning we do not guarantee that it should work nice and easy for the user and that everything will work
Ok, but it doesn't mean that we shouldn't think about how to make the method better :slightly_smiling_face: