TensorRT-LLM [examples/bert/build.py]: Load weights for BertModel and RobertaModel if `--model

Currently the example TensorRT LLM engine builder for Bert models simply ignores model weights if those are present in the model directory, it only reads the config.json file, making it essentially impossible to generate a working engine from a pretrained model. This change fixes that.

Sep 03 '24 12:09 tkhanipov

Created a bug: #2197

Sep 05 '24 14:09 tkhanipov

@symphonylyh Could you please take a look at this PR? Thanks~

Sep 08 '24 10:09 lfr-0531

Hi @tkhanipov , thanks for the PR! We're currently doing a refinement of the BERT workflow, will address this problem and merge your PR 👍

Sep 27 '24 06:09 symphonylyh

Hi @symphonylyh! Thank you for the response. Just curious: isn't this refinement to the BERT workflow you are talking about related to supporting BERT in Executor API? So far, AFAIU (please correct me if I am wrong), encoder only models are not supported there.

Oct 08 '24 09:10 tkhanipov

Hi @tkhanipov , thanks for the PR! We're currently doing a refinement of the BERT workflow, will address this problem and merge your PR 👍

You did quite an extensive refinement and my PR is now obsolete, thus closing it. Thank you!

Apr 13 '25 10:04 tkhanipov

@tkhanipov also want to point out we now have BERT supported in the so-called Pytorch workflow (that doesn't have the idea of building an TRT engine but still benefit from the optimized kernels, with a more pythonic interface). You can try it out https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/pytorch#supported-models

Apr 13 '25 19:04 symphonylyh

[examples/bert/build.py]: Load weights for BertModel and RobertaModel if `--model_dir` is provided