vowpal_wabbit [Question] Multithreaded behaviour of Vowpal Wabbit CATS mode using C Wrapper

Hi everyone. I am working on building a realtime prediction service which uses vowpal wabbit model in CATS mode to return action and pdf. This service is multithreaded with the thread count > 1000 and is used to get predictions only. All the training is done offline.

I am using c wrapper of vowpal wabbit build using following configuration:

RUN cmake -S . -B build -G Ninja \
   -DCMAKE_BUILD_TYPE:STRING="Release" \
   -DFMT_SYS_DEP:BOOL="OFF" \
   -DRAPIDJSON_SYS_DEP:BOOL="OFF" \
   -DSPDLOG_SYS_DEP:BOOL="OFF" \
   -DVW_BOOST_MATH_SYS_DEP:BOOL="OFF" \
   -DVW_GTEST_SYS_DEP:BOOL="OFF" \
   -DVW_ZLIB_SYS_DEP:BOOL="OFF" \
   -DBUILD_TESTING:BOOL="OFF"

On my service end, the high level flow is:

Loading the model Initialize model with VW_InitializeA("--json -q :: --quiet --predict_only_model -i model.vw")

Getting predictions as [This flow is accessed by multiple threads with a single shared instance of vw model]:

Convert input to vowpal wabbit JSON format as specified in: https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/python_cats.html
Using this json string to convert to VW_Example by adding following function to vwdll.cc (I couldnt find json parser method in c wrapper so i wrote my own)

  VW_DLL_PUBLIC VW_EXAMPLE VW_CALLING_CONV VW_Read_Json_Example(VW_HANDLE handle, const char* line) {
      auto* vw = static_cast<VW::workspace*>(handle);
      VW::multi_ex examples;
      examples.push_back(&VW::get_unused_example(vw));
      vw->example_parser->text_reader(vw, line, strlen(line), examples);
      example* example = examples[0];
      VW::setup_example(*vw, example);
      return static_cast<VW_EXAMPLE>(example);
  }

Getting Action and Pdf value by using VW_Get_Cats_Action_Pdf_Value method (of my own)

  VW_DLL_PUBLIC void VW_CALLING_CONV VW_Get_Cats_Action_Pdf_Value(VW_HANDLE handle, VW_EXAMPLE example, float action_and_pdf_value[2]) {
    auto* vw = static_cast<VW::workspace*>(handle);
    auto* ex = static_cast<VW::example*>(example);
    vw->predict(*ex);
    float action = ex->pred.pdf_value.action;
    float pdf = ex->pred.pdf_value.pdf_value;
    action_and_pdf_value[0] = action;
    action_and_pdf_value[1] = pdf;
  }

Calling VW_FinishExample() with the idea to deallocate any memory utilized by the example created in step 2

Returning the action and pdf value

Clearing model allocated memery: Calling VW_Finish()

My question is whether this

Is the right way of dealing with mutlithreaded environment?
Am i deallocating memory correctly ?

Sep 21 '22 07:09 0110G

@jackgerrits

Sep 21 '22 09:09 0110G

Hi @0110G

Yes you are managing the examples correctly, calling finish_example will return the example objects to the internal VW example pool (where get_unused_example gets them from) and memory will be deallocated when VW shuts down.

Regarding calling vw predict from multiple threads, VW is not guaranteed to be thread safe as it may change the internal state of the vw instance, and it does so without taking thread safety into account. It might happen to work for a specific reduction (e.g. CATS) but it would be safer if you had a VW instance available for each thread to call predict on. This would bloat your memory consumption but it will be safer

Sep 21 '22 13:09 olgavrou

@olgavrou @jackgerrits My usecase involves following flow (After fixed time interval):

Download and save new multiple VW Cats model file
Load multiple VW Model in memory sequentially using VW_InitializeA(--json -q :: --quiet --predict_only_model -i <download model file>)
Delete the downloaded model files
Release ALL the resources held by previous loaded models by calling VW_Finish() on previous model handles
Using these models to serve action and pdf value for inputs
Repeat

After each cycle, I am observing an increase in RSS when new model is loaded.

My questions are:

Does VW_Finish release all the resources held by the VW Model including the example pool you talked about in the previous question? Am i missing any other dealloc calls?
Since the model is used only for predictions, are the args --json -q :: --quiet --predict_only_model -i model.vw correct? While going through the docs, I came across --testonly, some cache file that vw maintains. Is it possible that due to incorrect args, some metadata is dumped by vw which is causing increase in RSS?

Please guide

Sep 25 '22 05:09 0110G

Hi @0110G are the models on each cycle the same or are they newly trained models? Do you know if the model size increases between each cycle? If a model is newer it is likely that it has more learned weights resulting in a higher memory usage. As far as allocations go, finish example and vw finish should be all that is needed.

Regarding the cli arguments you don't really need --predict_only_model as it is an argument that affects the exporting of a model (see here). Since you are calling predict explicitly you also don't really need the testonly argument and these should not affect the memory footprint between loaded vw instances

Sep 26 '22 20:09 olgavrou

@0110G, it looks like Olga answered your questions. Please feel free to reopen or create a new issue if you have more to ask.

Dec 28 '22 20:12 jackgerrits

vowpal_wabbit vowpal_wabbit copied to clipboard

[Question] Multithreaded behaviour of Vowpal Wabbit CATS mode using C Wrapper

vowpal_wabbit
vowpal_wabbit copied to clipboard