Ezhil Raj Selvaraj

Results 3 comments of Ezhil Raj Selvaraj

use BetterTransformer this will reduce the inference of 20-30 persentage

from transformers import BetterTransformer model = BetterTransformer.transform(model, keep_original_model=False)

Hi, I think, i have temporary solution for this issue. in the line 48 of the file ("\metavoice/metavoice-src/fam/llm/fast_inference_utils.py ") you have to command the below line. torch._inductor.config.fx_graph_cache = ( True...