TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

examples/bert/build.py does not use model weights

Open tkhanipov opened this issue 1 year ago • 1 comments

System Info

Currently the example TensorRT LLM engine builder for Bert models simply ignores model weights if those are present in the model directory, it only reads the config.json file, making it essentially impossible to generate a working engine from a pretrained model.

A possible fix is available in #2187

Who can help?

@byshiue

Information

  • [x] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

Scenario 1 (simplest)

  1. Prepare a pre-trained model (i.e. have a directory with config.json and the weights file).
  2. Replace the weights file with some random content (e.g. any text file).
  3. Run examples/bert/build.py --model_dir input_model

Scenario 2 (use of the weights)

  1. Prepare a pre-trained model (i.e. have a directory with config.json and the weights file).
  2. Run examples/bert/build.py --model_dir input_model
  3. Execute the generated TensorRT LLM engine with some input and check the output tensor.
  4. Execute the input model with the same input and check the output tensor.

Expected behavior

Scenario 1 (simplest)

build.py shall show an error message complaining about invalid weights file.

Scenario 2 (use of the weights)

The output tensors shall have numerically close components.

actual behavior

Scenario 1 (simplest)

build.py finished successfully, generating bert_outputs/config.json and bert_outputs/BertModel_float16_tp1_rank0.engine.

Scenario 2 (use of the weights)

The output tensors look totally unrelated and different from each other.

additional notes

The problem is that the script code only loads the config and does not do anything to load the weights. The fix is available in #2187.

tkhanipov avatar Sep 05 '24 14:09 tkhanipov

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] avatar Oct 06 '24 02:10 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity.

github-actions[bot] avatar Oct 22 '24 02:10 github-actions[bot]

@symphonylyh Perhaps this bug shall be reopened given that the problem persists? A similar issue was mentioned in a (much later) bug #2379 but the fix did not cover the original case AFAIU.

tkhanipov avatar Nov 06 '24 12:11 tkhanipov