models icon indicating copy to clipboard operation
models copied to clipboard

[QST] Performance differences between `encode()` vs `__call__()` on tf Encoder block in CPU

Open lecardozo opened this issue 2 years ago • 4 comments

❓ Questions & Help

What is the preferred way of generating predictions from a trained Encoder from a TwoTowerModelV2? There seem to be at least two ways of doing that, with apparently huge performance differences.

Details

After training a TwoTowerModelV2 I noticed that there is a huge difference in performance between calling the model.query_encoder.encode() method of each tower versus calling it directly model.query_encoder() on a single node with CPU.

Setup

import pandas as pd
import nvtabular as nvt

# Encoder
query_encoder = trained_two_tower_model.query_encoder

# Raw Features
features = pd.DataFrame(...)

# Transformed features with nvt.Worflow
query_preprocessor = workflow.get_subworkflow("query_preprocessor")
data = nvt.Dataset(features, schema=self._user_schema)
transformed_data = query_preprocessor.transform(data)

Calling encode()

This takes more >1 hour on 434457 rows. Resource usage metrics show that the CPU is idle most of the time, which is quite unexpected.

outputs = query_encoder.encode(transformed_data, batch_size=1024, index=Tags.USER_ID).compute()

Tried increasing the number of partitions of the transformed dataset and set the .compute(scheduler='processes') to benefit from Dask's parallelization, but it didn't work (failed with serialization issues)

Calling __call__() with Loader

This takes ~30 seconds on 434457 rows. As my data fits into memory, this ended up being a clear winner.

outputs = []
for inputs, _ mm.Loader(transformed_data, batch_size=1024, shuffle=False):
    outputs.append(query_encoder(inputs))

output = np.concatenate(outputs)

Is this difference expected or am I doing something wrong?

lecardozo avatar Oct 04 '23 10:10 lecardozo

@lecardozo you can check out the Generate top-K recommendations section in this example nb showcasing how to generate topK recommendations for a given batch. and you can loop over the batches and then concat the outputs.

rnyak avatar Oct 04 '23 20:10 rnyak

Thanks for the answer @rnyak!

Sorry, I think I wasn't clear before. I'm looking specifically for a way of generating embeddings for query/candidates independently, instead of generating recommendations. The idea is to have candidate embeddings indexed on an external vector search engine and use ANN for retrieval later.

lecardozo avatar Oct 05 '23 01:10 lecardozo

@lecardozo the same notebook shows how to generate candidate and query embeddings.

queries = model.query_embeddings(Dataset(user_features, schema=schema.select_by_tag(Tags.USER)), 
                                 batch_size=1024, index=Tags.USER_ID)
query_embs_df = queries.compute(scheduler="synchronous").reset_index()

item_features = (
    unique_rows_by_features(train, Tags.ITEM, Tags.ITEM_ID).compute().reset_index(drop=True)
)
item_embs = model.candidate_embeddings(Dataset(item_features, schema=schema.select_by_tag(Tags.ITEM)), 
                                       batch_size=1024, index=Tags.ITEM_ID)

hope that helps.

rnyak avatar Oct 05 '23 11:10 rnyak

That was my first try, as I followed along the whole notebook. As these methods are just thin wrappers around the Encoder.encode(), we end up having the same performance issues that I mentioned befored (which is what made me look at the source code of these methods in the first place).

lecardozo avatar Oct 05 '23 11:10 lecardozo