vowpal_wabbit icon indicating copy to clipboard operation
vowpal_wabbit copied to clipboard

Multiclass Classifier Consumes Large Memory

Open lordm opened this issue 10 months ago • 1 comments

Describe the bug

After training a multiclass classifier, it produced a model with file size of 78 MB. Later, when the model is loaded for testing and predictions, the model consumes around 1GB of memory.

The large memory behaviour happens for both the python VW daemon. It happens only with multiclass models even if the number of classes is just 2, binary scalar models do not exhibit the same behaviour.

How to reproduce

Used this command line vw -d train.vw -f model.model -c --holdout_after 671358 --oaa 2 --probabilities --sgd -b 28 --decay_learning_rate 0.960291948391061 -l 0.058964302633529454 --l1 7.3193537905481405e-06 --l2 1.472820769966616e-07 --loss_function logistic --passes 60 --power_t 0.013904659435534318 --random_seed 17

Number of examples 600K

Version

9.8.0

OS

Linux

Language

Python, CLI

Additional context

No response

lordm avatar Apr 02 '24 01:04 lordm

Can you quantify how much memory the model is using as a function of the number of classes?

If only a small number of the parameters are non zero you can use the sparse representation (--sparse_weights).

JohnLangford avatar Apr 11 '24 15:04 JohnLangford

Closing for now, but reopen if you want to pursue.

JohnLangford avatar Aug 01 '24 15:08 JohnLangford