llm-analysis How to get the analysis of model Qwen1.5-0.5B

How to get the analysis of model Qwen1.5-0.5B

Open qxpBlog opened this issue 10 months ago • 0 comments

@mvpatel2000 @cli99 @weimingzha0 @digger-yu @BhAem I want to get the analysis info Time to first token (s) 、Time for completion (s) and Tokens/second about the model Qwen1.5-0.5B , so do I just need to run the following command :

HF_ENDPOINT=https://hf-mirror.com
gpu_name='a100-sxm-80gb'
dtype_name="w16a16e16"
output_dir='outputs_infer'
model_name=Qwen/Qwen1.5-0.5B
batch_size_per_gpu=1
tp_size=2
output_file_suffix="-bs${batch_size_per_gpu}"
cost_per_gpu_hour=2.21
seq_len=128
num_tokens_to_generate=242
flops_efficiency=0.7
hbm_memory_efficiency=0.9
achieved_tflops=200                # will overwrite the flops_efficiency above
achieved_memory_bandwidth_GBs=1200 # will overwrite the hbm_memory_efficiency above

if [[ ! -e $output_dir ]]; then
    mkdir $output_dir
elif [[ ! -d $output_dir ]]; then
    echo "$output_dir already exists but is not a directory" 1>&2
fi

HF_ENDPOINT=$HF_ENDPOINT CUDA_VISIBLE_DEVICES=3 python -m llm_analysis.analysis infer --model_name=${model_name} --gpu_name=${gpu_name} --dtype_name=${dtype_name} -output_dir=${output_dir} --output-file-suffix=${output_file_suffix} \
    --seq_len=${seq_len} --num_tokens_to_generate=${num_tokens_to_generate} --batch_size_per_gpu=${batch_size_per_gpu} \
    --tp_size=${tp_size} \
    --cost_per_gpu_hour=${cost_per_gpu_hour} \
    --flops_efficiency=${flops_efficiency} --hbm_memory_efficiency=${hbm_memory_efficiency} --log_level DEBUG

Apr 13 '24 08:04 qxpBlog

llm-analysis llm-analysis copied to clipboard

How to get the analysis of model Qwen1.5-0.5B

llm-analysis
llm-analysis copied to clipboard