inference Performance improvement- GPT-J and BERT Offline scenario

Performance improvement- GPT-J and BERT Offline scenario

Open anandhu-eng opened this issue 8 months ago • 1 comments

The current implementation of GPT-J and BERT carries out the prediction in sequential manner. Could the performance of GPT-J and BERT be improved by implementing parallel processing through threads rather than sequential processing?

GPT-J ref: https://github.com/mlcommons/inference/blob/fa4fe53e53379dee27a216695a2b710d122154c7/language/gpt-j/backend.py#L72

BERT ref: https://github.com/mlcommons/inference/blob/fa4fe53e53379dee27a216695a2b710d122154c7/language/bert/pytorch_SUT.py#L68

Jun 04 '24 14:06 anandhu-eng

inference inference copied to clipboard

Performance improvement- GPT-J and BERT Offline scenario

inference
inference copied to clipboard