DialogRPT
DialogRPT copied to clipboard
Performance issues with DialogRPT + DialoGPT
Hi again @golsun,
I've been working with DialogRPT using DialoGPT-large for dialog generation and have hit some performance issues that aren't present when using just DialoGPT-large. Round trip responses using CPU inference are just a few seconds with gpt2-large but whenever DialogRPT is used with the DialoGPT-large checkpoint, performance grinds to a halt. With GPU inference I can run gpt2-large on a 6GB GPU but with DialogRPT I get OOM. I understand that there are multiple models running with the combination of DialogRPT + DialoGPT which is the obvious culprit, is there any way to serialize execution of the two models to prevent these resource consumption issues?
hi @pablogranolabar ,
I can think of several potential reasons of OOM:
-
torch.no_grad
which avoids grad taking memory it was already applied in scorer, but not ingeneration.py
-- I've updated it here, please take a look. - number of candidates to be scored -- if it's too large, you can split candidates into several batches and send them to DialogRPT, similar to this
- if that still doesn't work, I guess you can use two machines, one just for DialoGPT-large and one for DialogRPT, and use API to communicate with each other
- how many DialogRPT models are you using? I recommend at least
updown
andhuman_vs_rand
becauseupdown
doesn't capture context-response relevance.
Hi @golsun, thanks for the quick response!
The two machine idea makes sense, I think I can do that with relative ease if it comes to that.
For the DialogRPT models I am just using updown. So I should ensemble at least updown + human_vs_rand? This application is for a conversational agent that can rerank dialog based on human scoring of the chatbot responses.
yes human_vs_rand
(together with updown
)should help in that case.
if memory is a concern, a low-memory way without using human_vs_rand
is to decode response with small top_k
or top_p
, this should also help the response to be relevant to context. but I guess the performance depends on the scenarios.....
Hi again @golsun. I'm working on ensembling human_vs_rand with updown per your advice, but I'm unsure of the way to proceed with ensemble.yml. Should human_vs_rand and updown be a part of prior with equal weights? Or should human_vs_rand be prior and with updown conditional? Based on the performance reasons above I'm trying to do this with just a two model ensemble as you suggested.
hi, in this case, I guess a simple way without dealing with ensemble.yml
is
# `get_model` and `predict` are functions from score.py
hvm = get_model('restore/human_vs_machine.pth')
updown = get_model('restore/updown.pth')
score_hvm = predict(hvm, cxt, hyps)
score_updown = predict(updown, cxt, hyps)
score_overall = np.sqrt(score_updown * score_hvm) # use this as the final score
I used geometric mean for score_overall
, but you can play with some weighted arithmetic mean.