A-ML-ER
A-ML-ER
How to convert Llama structure into Faster transformer sturcture ? it seem has 32 layers with LlamaRotaryEmbedding ?
build the project inside the container or build on the ECS?
any update for multi node deployment ?
I0404 14:43:41.957637 63955 server.cc:594] +-------------------+---------+-----------------------------------------------------------------------------------------------------+ | Model | Version | Status | +-------------------+---------+-----------------------------------------------------------------------------------------------------+ | fastertransformer | 1 | UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backends/fastertransformer | | |...
/ft_workspace/fastertransformer_backend the source git clone from https://github.com/triton-inference-server/fastertransformer_backend.git
https://github.com/triton-inference-server/fastertransformer_backend.git main branch the latest one