[QST] Performance differences between `encode()` vs `__call__()` on tf Encoder block in CPU
❓ Questions & Help
What is the preferred way of generating predictions from a trained Encoder from a TwoTowerModelV2? There seem to be at least two ways of doing that, with apparently huge performance differences.
Details
After training a TwoTowerModelV2 I noticed that there is a huge difference in performance between calling the model.query_encoder.encode() method of each tower versus calling it directly model.query_encoder() on a single node with CPU.
Setup
import pandas as pd
import nvtabular as nvt
# Encoder
query_encoder = trained_two_tower_model.query_encoder
# Raw Features
features = pd.DataFrame(...)
# Transformed features with nvt.Worflow
query_preprocessor = workflow.get_subworkflow("query_preprocessor")
data = nvt.Dataset(features, schema=self._user_schema)
transformed_data = query_preprocessor.transform(data)
Calling encode()
This takes more >1 hour on 434457 rows. Resource usage metrics show that the CPU is idle most of the time, which is quite unexpected.
outputs = query_encoder.encode(transformed_data, batch_size=1024, index=Tags.USER_ID).compute()
Tried increasing the number of partitions of the transformed dataset and set the .compute(scheduler='processes') to benefit from Dask's parallelization, but it didn't work (failed with serialization issues)
Calling __call__() with Loader
This takes ~30 seconds on 434457 rows. As my data fits into memory, this ended up being a clear winner.
outputs = []
for inputs, _ mm.Loader(transformed_data, batch_size=1024, shuffle=False):
outputs.append(query_encoder(inputs))
output = np.concatenate(outputs)
Is this difference expected or am I doing something wrong?
@lecardozo you can check out the Generate top-K recommendations section in this example nb showcasing how to generate topK recommendations for a given batch. and you can loop over the batches and then concat the outputs.
Thanks for the answer @rnyak!
Sorry, I think I wasn't clear before. I'm looking specifically for a way of generating embeddings for query/candidates independently, instead of generating recommendations. The idea is to have candidate embeddings indexed on an external vector search engine and use ANN for retrieval later.
@lecardozo the same notebook shows how to generate candidate and query embeddings.
queries = model.query_embeddings(Dataset(user_features, schema=schema.select_by_tag(Tags.USER)),
batch_size=1024, index=Tags.USER_ID)
query_embs_df = queries.compute(scheduler="synchronous").reset_index()
item_features = (
unique_rows_by_features(train, Tags.ITEM, Tags.ITEM_ID).compute().reset_index(drop=True)
)
item_embs = model.candidate_embeddings(Dataset(item_features, schema=schema.select_by_tag(Tags.ITEM)),
batch_size=1024, index=Tags.ITEM_ID)
hope that helps.
That was my first try, as I followed along the whole notebook. As these methods are just thin wrappers around the Encoder.encode(), we end up having the same performance issues that I mentioned befored (which is what made me look at the source code of these methods in the first place).