ML icon indicating copy to clipboard operation
ML copied to clipboard

Softmax Classifier & partial training

Open ElGigi opened this issue 2 years ago • 1 comments

Hi,

In the documentation it is stated that partial training can be used to reduce memory consumption.

I tried to train a Softmax classifier with several datasets and partial methods. But only the first labels of the train() method are known. If new labels are present in the dataset given to the partial() method, they are not taken into account.

Can Dataset object retain set of all labels after Labeled::fold() method?

Regards.

ElGigi avatar Sep 12 '23 23:09 ElGigi

Yes, the first training set defines all the possible labels for the model. If you want to fold your dataset such that each fold has samples that correspond to all possible classes in the master dataset then you can use the straftifiedFold() method.

$folds = $dataset->stratifiedFold(5);

https://docs.rubixml.com/2.0/datasets/labeled.html#stratification

andrewdalpino avatar Sep 13 '23 21:09 andrewdalpino