fastertransformer_backend T5: Triton Model Repository (containing model weights and configuration) on S3 doesn't work as expected

Description

It appears that Triton Server with Faster transformer backend doesn't work as expected when loading the model repository from S3 (containing both configuration and model weights). 

Release: v1.2
GPU: V100
Command used to invoke Triton Server: 
`CUDA_VISIBLE_DEVICES=0 /opt/tritonserver/bin/tritonserver --model-repository s3://*/users/dhaval-doshi/t5-3b/triton-model-store/t5/ --log-info true`

The invocation fails with the below error:

[ERROR] Can't load '/tmp/folderXaegJB/1/t5/config.ini'
terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR]  Assertion fail: /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/triton_backend/t5/T5TritonModel.cc:91

The model repository structure on S3 is as follows:

S3://*/triton-model-store/t5/fastertransformer/config.pbtxt
S3://*/triton-model-store/t5/fastertransformer/1/<weight files and config.ini files>

The above structure is inline with how the model repository is created in this repo for T5 in this path: all_models/t5/fastertransformer/…

Details:

It looks like when you start triton server with S3 path to model repository it downloads the contents to the docker container on the startup to a temp folder:

/tmp/folderXaegJB/

ls /tmp/folderXaegJB/
1  config.pbtxt

Which is basically the contents from s3 model repository from the directory: 3://*/triton-model-store/t5/fastertransformer.

However when triton tries to construct model_checkpoint_path to pass it to FT for loading T5 using the below line of code:

https://github.com/triton-inference-server/fastertransformer_backend/blob/225b57898b830a13b5634ee10b812c96bad802b0/src/libfastertransformer.cc#L265

It basically constructs the below path which of course doesn’t exist.

/tmp/folderXaegJB/1/t5/config.ini

Hence there is inconsistency in how model repository is expected to be structured and how you download and resolve it from S3.

I cannot explicitly pass model_checkpoint_path because triton downloads all this from s3 into a temp folder which I don’t know before hand which temp folder it would be.

Note: It also appears that Faster transformer backend's model repository structure is different from the model repository guidance provided here: https://github.com/triton-inference-server/server/blob/e9ef15b0fc06d45ceca28861c98b31d0e7f9ee79/docs/user_guide/model_repository.md

The faster transformer backend and ensemble models also expect you to put files under \fastertransformer<version>\weights and \fastertransformer\config.pbtxt.

Please help investigate this issue.



### Reproduced Steps

```shell
1. Upload the model weights and model repository to S3 bucket. You can copy the model repository in this repo and upload the weights of 1-gpu (after running the conversion script) inside the \1 folder.
2. Run the triton with command shown in the description above.

Nov 04 '22 03:11 dhaval24

FT backend only supports local directory now. It cannot load the s3 folder directly.

Nov 04 '22 03:11 byshiue

I see, is there a plan to support the S3 folders directly? I was in the impression that this is already supported.

Nov 04 '22 04:11 dhaval24

We will consider it. Thank you for the suggestion.

Nov 04 '22 04:11 byshiue

Thank you, so for now suggestion is to download the assets from S3 to local container via a shell script? Nvidia solutions architects told me that this was supported and hence I was in this impression.

Nov 04 '22 04:11 dhaval24

We find that we don't need to modify anything to support loading model from S3. You can refer the document https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/t5_guide.md#loading-model-by-s3.

Jan 23 '23 03:01 byshiue