LibRecommender icon indicating copy to clipboard operation
LibRecommender copied to clipboard

Remove a node in Pinsage

Open leeway-liu opened this issue 8 months ago • 7 comments

Hey,

Thanks very much for implementing the library. I am trying to experiment on a large dataset using PinSageDGL, and found that with time, the GPU memory creeps up, there are with the same batch size:

Image

Do you know why it may happen? I am wondering whether it's because the graph getting bigger with time, it needs to hold more weights, user identification etc. If so, do you know if I can delete a non-activate user node or item node without a full refresh retraining?

leeway-liu avatar Jun 05 '25 10:06 leeway-liu

Hi,

I don't think the growing graph is causing the problem, since the graph remains on CPU and only the sampled batches and subgraphs are transferred to GPU.

Given that you mentioned using a large dataset, are you using incremental training?

massquantity avatar Jun 06 '25 04:06 massquantity

Hello,

Thanks for the quick reply. Yes, I am using incremental training, load my data in batches, using merge_dataset, so my n_users and n_items are increasing, hence I am wondering whether it's to do with loading user identification etc?

leeway-liu avatar Jun 06 '25 07:06 leeway-liu

Each user and item requires an embedding vector in the model. With incremental training, the growing number of users and items will increase memory usage.

massquantity avatar Jun 07 '25 14:06 massquantity

With the number increace, it will need bigger and bigger GPU memory. Can we remove the non-active users and items and reindex them so that the user and item embedding vectors stay similar size? Or do you recommend to do a full-training from time to time?

leeway-liu avatar Jun 16 '25 09:06 leeway-liu

This library does not support the removal of non-active users and items, as it is designed for scenarios where the entire model fits into memory.

For larger, real-world applications with memory constraints, the "hashing trick" is a common solution, which assigns a fixed hash size to all users and items. This is implemented in a new library I am developing.

massquantity avatar Jun 18 '25 18:06 massquantity

Do you have any papers or blogs on those "hashing trick" to share? Why a new library rather than extending this library, just curious :).

leeway-liu avatar Jul 01 '25 07:07 leeway-liu

Just ask Gemini 2.5 Pro, "What is hashing trick in recommender system?"

This library uses TensorFlow, which is hard to refactor and almost dead nowadays. The new library will use PyTorch and include a number of new features.

massquantity avatar Jul 01 '25 16:07 massquantity