seamless_communication icon indicating copy to clipboard operation
seamless_communication copied to clipboard

Missing key(s) and size mismatch error for SeamlessM4T_medium

Open YJYJLee opened this issue 5 months ago • 8 comments

Hi, I am trying to use SeamlessM4T_medium ckpt for evaluation, but I am getting following error while loading the ckpt. I just added --model_name seamlessM4T_medium to the command, is there anything else I should do to use SeamlessM4T_medium?

        Missing key(s) in state_dict: "text_decoder.layers.4.self_attn_layer_norm.weight", "text_decoder.layers.4.self_attn_layer_norm.bias", "text_decoder.layers.4.self_attn.q_proj.weight", "text_decoder.layers.4.self_attn.q_proj.bias", "text_decoder.layers.4.self_attn.k_proj.weight", "text_decoder.layers.4.self_attn.k_proj.bias", "text_decoder.layers.4.self_attn.v_proj.weight", "text_decoder.layers.4.self_attn.v_proj.bias", "text_decoder.layers.4.self_attn.output_proj.weight", "text_decoder.layers.4.self_attn.output_proj.bias", "text_decoder.layers.4.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.4.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.4.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.4.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.4.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.4.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.4.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.4.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.4.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.4.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.4.ffn_layer_norm.weight", "text_decoder.layers.4.ffn_layer_norm.bias", "text_decoder.layers.4.ffn.inner_proj.weight", "text_decoder.layers.4.ffn.inner_proj.bias", "text_decoder.layers.4.ffn.output_proj.weight", "text_decoder.layers.4.ffn.output_proj.bias", "text_decoder.layers.5.self_attn_layer_norm.weight", "text_decoder.layers.5.self_attn_layer_norm.bias", "text_decoder.layers.5.self_attn.q_proj.weight", "text_decoder.layers.5.self_attn.q_proj.bias", "text_decoder.layers.5.self_attn.k_proj.weight", "text_decoder.layers.5.self_attn.k_proj.bias", "text_decoder.layers.5.self_attn.v_proj.weight", "text_decoder.layers.5.self_attn.v_proj.bias", "text_decoder.layers.5.self_attn.output_proj.weight", "text_decoder.layers.5.self_attn.output_proj.bias", "text_decoder.layers.5.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.5.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.5.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.5.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.5.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.5.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.5.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.5.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.5.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.5.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.5.ffn_layer_norm.weight", "text_decoder.layers.5.ffn_layer_norm.bias", "text_decoder.layers.5.ffn.inner_proj.weight", "text_decoder.layers.5.ffn.inner_proj.bias", "text_decoder.layers.5.ffn.output_proj.weight", "text_decoder.layers.5.ffn.output_proj.bias", "text_decoder.layers.6.self_attn_layer_norm.weight", "text_decoder.layers.6.self_attn_layer_norm.bias", "text_decoder.layers.6.self_attn.q_proj.weight", "text_decoder.layers.6.self_attn.q_proj.bias", "text_decoder.layers.6.self_attn.k_proj.weight", "text_decoder.layers.6.self_attn.k_proj.bias", "text_decoder.layers.6.self_attn.v_proj.weight", "text_decoder.layers.6.self_attn.v_proj.bias", "text_decoder.layers.6.self_attn.output_proj.weight", "text_decoder.layers.6.self_attn.output_proj.bias", "text_decoder.layers.6.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.6.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.6.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.6.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.6.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.6.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.6.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.6.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.6.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.6.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.6.ffn_layer_norm.weight", "text_decoder.layers.6.ffn_layer_norm.bias", "text_decoder.layers.6.ffn.inner_proj.weight", "text_decoder.layers.6.ffn.inner_proj.bias", "text_decoder.layers.6.ffn.output_proj.weight", "text_decoder.layers.6.ffn.output_proj.bias", "text_decoder.layers.7.self_attn_layer_norm.weight", "text_decoder.layers.7.self_attn_layer_norm.bias", "text_decoder.layers.7.self_attn.q_proj.weight", "text_decoder.layers.7.self_attn.q_proj.bias", "text_decoder.layers.7.self_attn.k_proj.weight", "text_decoder.layers.7.self_attn.k_proj.bias", "text_decoder.layers.7.self_attn.v_proj.weight", "text_decoder.layers.7.self_attn.v_proj.bias", "text_decoder.layers.7.self_attn.output_proj.weight", "text_decoder.layers.7.self_attn.output_proj.bias", "text_decoder.layers.7.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.7.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.7.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.7.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.7.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.7.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.7.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.7.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.7.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.7.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.7.ffn_layer_norm.weight", "text_decoder.layers.7.ffn_layer_norm.bias", "text_decoder.layers.7.ffn.inner_proj.weight", "text_decoder.layers.7.ffn.inner_proj.bias", "text_decoder.layers.7.ffn.output_proj.weight", "text_decoder.layers.7.ffn.output_proj.bias", "text_decoder.layers.8.self_attn_layer_norm.weight", "text_decoder.layers.8.self_attn_layer_norm.bias", "text_decoder.layers.8.self_attn.q_proj.weight", "text_decoder.layers.8.self_attn.q_proj.bias", "text_decoder.layers.8.self_attn.k_proj.weight", "text_decoder.layers.8.self_attn.k_proj.bias", "text_decoder.layers.8.self_attn.v_proj.weight", "text_decoder.layers.8.self_attn.v_proj.bias", "text_decoder.layers.8.self_attn.output_proj.weight", "text_decoder.layers.8.self_attn.output_proj.bias", "text_decoder.layers.8.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.8.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.8.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.8.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.8.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.8.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.8.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.8.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.8.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.8.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.8.ffn_layer_norm.weight", "text_decoder.layers.8.ffn_layer_norm.bias", "text_decoder.layers.8.ffn.inner_proj.weight", "text_decoder.layers.8.ffn.inner_proj.bias", "text_decoder.layers.8.ffn.output_proj.weight", "text_decoder.layers.8.ffn.output_proj.bias", "text_decoder.layers.9.self_attn_layer_norm.weight", "text_decoder.layers.9.self_attn_layer_norm.bias", "text_decoder.layers.9.self_attn.q_proj.weight", "text_decoder.layers.9.self_attn.q_proj.bias", "text_decoder.layers.9.self_attn.k_proj.weight", "text_decoder.layers.9.self_attn.k_proj.bias", "text_decoder.layers.9.self_attn.v_proj.weight", "text_decoder.layers.9.self_attn.v_proj.bias", "text_decoder.layers.9.self_attn.output_proj.weight", "text_decoder.layers.9.self_attn.output_proj.bias", "text_decoder.layers.9.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.9.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.9.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.9.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.9.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.9.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.9.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.9.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.9.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.9.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.9.ffn_layer_norm.weight", "text_decoder.layers.9.ffn_layer_norm.bias", "text_decoder.layers.9.ffn.inner_proj.weight", "text_decoder.layers.9.ffn.inner_proj.bias", "text_decoder.layers.9.ffn.output_proj.weight", "text_decoder.layers.9.ffn.output_proj.bias", "text_decoder.layers.10.self_attn_layer_norm.weight", "text_decoder.layers.10.self_attn_layer_norm.bias", "text_decoder.layers.10.self_attn.q_proj.weight", "text_decoder.layers.10.self_attn.q_proj.bias", "text_decoder.layers.10.self_attn.k_proj.weight", "text_decoder.layers.10.self_attn.k_proj.bias", "text_decoder.layers.10.self_attn.v_proj.weight", "text_decoder.layers.10.self_attn.v_proj.bias", "text_decoder.layers.10.self_attn.output_proj.weight", "text_decoder.layers.10.self_attn.output_proj.bias", "text_decoder.layers.10.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.10.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.10.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.10.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.10.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.10.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.10.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.10.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.10.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.10.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.10.ffn_layer_norm.weight", "text_decoder.layers.10.ffn_layer_norm.bias", "text_decoder.layers.10.ffn.inner_proj.weight", "text_decoder.layers.10.ffn.inner_proj.bias", "text_decoder.layers.10.ffn.output_proj.weight", "text_decoder.layers.10.ffn.output_proj.bias", "text_decoder.layers.11.self_attn_layer_norm.weight", "text_decoder.layers.11.self_attn_layer_norm.bias", "text_decoder.layers.11.self_attn.q_proj.weight", "text_decoder.layers.11.self_attn.q_proj.bias", "text_decoder.layers.11.self_attn.k_proj.weight", "text_decoder.layers.11.self_attn.k_proj.bias", "text_decoder.layers.11.self_attn.v_proj.weight", "text_decoder.layers.11.self_attn.v_proj.bias", "text_decoder.layers.11.self_attn.output_proj.weight", "text_decoder.layers.11.self_attn.output_proj.bias", "text_decoder.layers.11.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.11.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.11.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.11.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.11.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.11.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.11.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.11.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.11.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.11.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.11.ffn_layer_norm.weight", "text_decoder.layers.11.ffn_layer_norm.bias", "text_decoder.layers.11.ffn.inner_proj.weight", "text_decoder.layers.11.ffn.inner_proj.bias", "text_decoder.layers.11.ffn.output_proj.weight", "text_decoder.layers.11.ffn.output_proj.bias". 
        Unexpected key(s) in state_dict: "target_letter_decoder.version", "target_letter_decoder.embed_tokens.weight", "target_letter_decoder.embed_positions._float_tensor", "target_letter_decoder.layers.0.self_attn.k_proj.weight", "target_letter_decoder.layers.0.self_attn.k_proj.bias", "target_letter_decoder.layers.0.self_attn.v_proj.weight", "target_letter_decoder.layers.0.self_attn.v_proj.bias", "target_letter_decoder.layers.0.self_attn.q_proj.weight", "target_letter_decoder.layers.0.self_attn.q_proj.bias", "target_letter_decoder.layers.0.self_attn.out_proj.weight", "target_letter_decoder.layers.0.self_attn.out_proj.bias", "target_letter_decoder.layers.0.self_attn_layer_norm.weight", "target_letter_decoder.layers.0.self_attn_layer_norm.bias", "target_letter_decoder.layers.0.encoder_attn.k_proj.weight", "target_letter_decoder.layers.0.encoder_attn.k_proj.bias", "target_letter_decoder.layers.0.encoder_attn.v_proj.weight", "target_letter_decoder.layers.0.encoder_attn.v_proj.bias", "target_letter_decoder.layers.0.encoder_attn.q_proj.weight", "target_letter_decoder.layers.0.encoder_attn.q_proj.bias", "target_letter_decoder.layers.0.encoder_attn.out_proj.weight", "target_letter_decoder.layers.0.encoder_attn.out_proj.bias", "target_letter_decoder.layers.0.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.0.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.0.fc1.weight", "target_letter_decoder.layers.0.fc1.bias", "target_letter_decoder.layers.0.fc2.weight", "target_letter_decoder.layers.0.fc2.bias", "target_letter_decoder.layers.0.final_layer_norm.weight", "target_letter_decoder.layers.0.final_layer_norm.bias", "target_letter_decoder.layers.1.self_attn.k_proj.weight", "target_letter_decoder.layers.1.self_attn.k_proj.bias", "target_letter_decoder.layers.1.self_attn.v_proj.weight", "target_letter_decoder.layers.1.self_attn.v_proj.bias", "target_letter_decoder.layers.1.self_attn.q_proj.weight", "target_letter_decoder.layers.1.self_attn.q_proj.bias", "target_letter_decoder.layers.1.self_attn.out_proj.weight", "target_letter_decoder.layers.1.self_attn.out_proj.bias", "target_letter_decoder.layers.1.self_attn_layer_norm.weight", "target_letter_decoder.layers.1.self_attn_layer_norm.bias", "target_letter_decoder.layers.1.encoder_attn.k_proj.weight", "target_letter_decoder.layers.1.encoder_attn.k_proj.bias", "target_letter_decoder.layers.1.encoder_attn.v_proj.weight", "target_letter_decoder.layers.1.encoder_attn.v_proj.bias", "target_letter_decoder.layers.1.encoder_attn.q_proj.weight", "target_letter_decoder.layers.1.encoder_attn.q_proj.bias", "target_letter_decoder.layers.1.encoder_attn.out_proj.weight", "target_letter_decoder.layers.1.encoder_attn.out_proj.bias", "target_letter_decoder.layers.1.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.1.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.1.fc1.weight", "target_letter_decoder.layers.1.fc1.bias", "target_letter_decoder.layers.1.fc2.weight", "target_letter_decoder.layers.1.fc2.bias", "target_letter_decoder.layers.1.final_layer_norm.weight", "target_letter_decoder.layers.1.final_layer_norm.bias", "target_letter_decoder.layers.2.self_attn.k_proj.weight", "target_letter_decoder.layers.2.self_attn.k_proj.bias", "target_letter_decoder.layers.2.self_attn.v_proj.weight", "target_letter_decoder.layers.2.self_attn.v_proj.bias", "target_letter_decoder.layers.2.self_attn.q_proj.weight", "target_letter_decoder.layers.2.self_attn.q_proj.bias", "target_letter_decoder.layers.2.self_attn.out_proj.weight", "target_letter_decoder.layers.2.self_attn.out_proj.bias", "target_letter_decoder.layers.2.self_attn_layer_norm.weight", "target_letter_decoder.layers.2.self_attn_layer_norm.bias", "target_letter_decoder.layers.2.encoder_attn.k_proj.weight", "target_letter_decoder.layers.2.encoder_attn.k_proj.bias", "target_letter_decoder.layers.2.encoder_attn.v_proj.weight", "target_letter_decoder.layers.2.encoder_attn.v_proj.bias", "target_letter_decoder.layers.2.encoder_attn.q_proj.weight", "target_letter_decoder.layers.2.encoder_attn.q_proj.bias", "target_letter_decoder.layers.2.encoder_attn.out_proj.weight", "target_letter_decoder.layers.2.encoder_attn.out_proj.bias", "target_letter_decoder.layers.2.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.2.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.2.fc1.weight", "target_letter_decoder.layers.2.fc1.bias", "target_letter_decoder.layers.2.fc2.weight", "target_letter_decoder.layers.2.fc2.bias", "target_letter_decoder.layers.2.final_layer_norm.weight", "target_letter_decoder.layers.2.final_layer_norm.bias", "target_letter_decoder.layers.3.self_attn.k_proj.weight", "target_letter_decoder.layers.3.self_attn.k_proj.bias", "target_letter_decoder.layers.3.self_attn.v_proj.weight", "target_letter_decoder.layers.3.self_attn.v_proj.bias", "target_letter_decoder.layers.3.self_attn.q_proj.weight", "target_letter_decoder.layers.3.self_attn.q_proj.bias", "target_letter_decoder.layers.3.self_attn.out_proj.weight", "target_letter_decoder.layers.3.self_attn.out_proj.bias", "target_letter_decoder.layers.3.self_attn_layer_norm.weight", "target_letter_decoder.layers.3.self_attn_layer_norm.bias", "target_letter_decoder.layers.3.encoder_attn.k_proj.weight", "target_letter_decoder.layers.3.encoder_attn.k_proj.bias", "target_letter_decoder.layers.3.encoder_attn.v_proj.weight", "target_letter_decoder.layers.3.encoder_attn.v_proj.bias", "target_letter_decoder.layers.3.encoder_attn.q_proj.weight", "target_letter_decoder.layers.3.encoder_attn.q_proj.bias", "target_letter_decoder.layers.3.encoder_attn.out_proj.weight", "target_letter_decoder.layers.3.encoder_attn.out_proj.bias", "target_letter_decoder.layers.3.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.3.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.3.fc1.weight", "target_letter_decoder.layers.3.fc1.bias", "target_letter_decoder.layers.3.fc2.weight", "target_letter_decoder.layers.3.fc2.bias", "target_letter_decoder.layers.3.final_layer_norm.weight", "target_letter_decoder.layers.3.final_layer_norm.bias", "target_letter_decoder.layers.4.self_attn.k_proj.weight", "target_letter_decoder.layers.4.self_attn.k_proj.bias", "target_letter_decoder.layers.4.self_attn.v_proj.weight", "target_letter_decoder.layers.4.self_attn.v_proj.bias", "target_letter_decoder.layers.4.self_attn.q_proj.weight", "target_letter_decoder.layers.4.self_attn.q_proj.bias", "target_letter_decoder.layers.4.self_attn.out_proj.weight", "target_letter_decoder.layers.4.self_attn.out_proj.bias", "target_letter_decoder.layers.4.self_attn_layer_norm.weight", "target_letter_decoder.layers.4.self_attn_layer_norm.bias", "target_letter_decoder.layers.4.encoder_attn.k_proj.weight", "target_letter_decoder.layers.4.encoder_attn.k_proj.bias", "target_letter_decoder.layers.4.encoder_attn.v_proj.weight", "target_letter_decoder.layers.4.encoder_attn.v_proj.bias", "target_letter_decoder.layers.4.encoder_attn.q_proj.weight", "target_letter_decoder.layers.4.encoder_attn.q_proj.bias", "target_letter_decoder.layers.4.encoder_attn.out_proj.weight", "target_letter_decoder.layers.4.encoder_attn.out_proj.bias", "target_letter_decoder.layers.4.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.4.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.4.fc1.weight", "target_letter_decoder.layers.4.fc1.bias", "target_letter_decoder.layers.4.fc2.weight", "target_letter_decoder.layers.4.fc2.bias", "target_letter_decoder.layers.4.final_layer_norm.weight", "target_letter_decoder.layers.4.final_layer_norm.bias", "target_letter_decoder.layers.5.self_attn.k_proj.weight", "target_letter_decoder.layers.5.self_attn.k_proj.bias", "target_letter_decoder.layers.5.self_attn.v_proj.weight", "target_letter_decoder.layers.5.self_attn.v_proj.bias", "target_letter_decoder.layers.5.self_attn.q_proj.weight", "target_letter_decoder.layers.5.self_attn.q_proj.bias", "target_letter_decoder.layers.5.self_attn.out_proj.weight", "target_letter_decoder.layers.5.self_attn.out_proj.bias", "target_letter_decoder.layers.5.self_attn_layer_norm.weight", "target_letter_decoder.layers.5.self_attn_layer_norm.bias", "target_letter_decoder.layers.5.encoder_attn.k_proj.weight", "target_letter_decoder.layers.5.encoder_attn.k_proj.bias", "target_letter_decoder.layers.5.encoder_attn.v_proj.weight", "target_letter_decoder.layers.5.encoder_attn.v_proj.bias", "target_letter_decoder.layers.5.encoder_attn.q_proj.weight", "target_letter_decoder.layers.5.encoder_attn.q_proj.bias", "target_letter_decoder.layers.5.encoder_attn.out_proj.weight", "target_letter_decoder.layers.5.encoder_attn.out_proj.bias", "target_letter_decoder.layers.5.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.5.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.5.fc1.weight", "target_letter_decoder.layers.5.fc1.bias", "target_letter_decoder.layers.5.fc2.weight", "target_letter_decoder.layers.5.fc2.bias", "target_letter_decoder.layers.5.final_layer_norm.weight", "target_letter_decoder.layers.5.final_layer_norm.bias", "target_letter_decoder.layers.6.self_attn.k_proj.weight", "target_letter_decoder.layers.6.self_attn.k_proj.bias", "target_letter_decoder.layers.6.self_attn.v_proj.weight", "target_letter_decoder.layers.6.self_attn.v_proj.bias", "target_letter_decoder.layers.6.self_attn.q_proj.weight", "target_letter_decoder.layers.6.self_attn.q_proj.bias", "target_letter_decoder.layers.6.self_attn.out_proj.weight", "target_letter_decoder.layers.6.self_attn.out_proj.bias", "target_letter_decoder.layers.6.self_attn_layer_norm.weight", "target_letter_decoder.layers.6.self_attn_layer_norm.bias", "target_letter_decoder.layers.6.encoder_attn.k_proj.weight", "target_letter_decoder.layers.6.encoder_attn.k_proj.bias", "target_letter_decoder.layers.6.encoder_attn.v_proj.weight", "target_letter_decoder.layers.6.encoder_attn.v_proj.bias", "target_letter_decoder.layers.6.encoder_attn.q_proj.weight", "target_letter_decoder.layers.6.encoder_attn.q_proj.bias", "target_letter_decoder.layers.6.encoder_attn.out_proj.weight", "target_letter_decoder.layers.6.encoder_attn.out_proj.bias", "target_letter_decoder.layers.6.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.6.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.6.fc1.weight", "target_letter_decoder.layers.6.fc1.bias", "target_letter_decoder.layers.6.fc2.weight", "target_letter_decoder.layers.6.fc2.bias", "target_letter_decoder.layers.6.final_layer_norm.weight", "target_letter_decoder.layers.6.final_layer_norm.bias", "target_letter_decoder.layers.7.self_attn.k_proj.weight", "target_letter_decoder.layers.7.self_attn.k_proj.bias", "target_letter_decoder.layers.7.self_attn.v_proj.weight", "target_letter_decoder.layers.7.self_attn.v_proj.bias", "target_letter_decoder.layers.7.self_attn.q_proj.weight", "target_letter_decoder.layers.7.self_attn.q_proj.bias", "target_letter_decoder.layers.7.self_attn.out_proj.weight", "target_letter_decoder.layers.7.self_attn.out_proj.bias", "target_letter_decoder.layers.7.self_attn_layer_norm.weight", "target_letter_decoder.layers.7.self_attn_layer_norm.bias", "target_letter_decoder.layers.7.encoder_attn.k_proj.weight", "target_letter_decoder.layers.7.encoder_attn.k_proj.bias", "target_letter_decoder.layers.7.encoder_attn.v_proj.weight", "target_letter_decoder.layers.7.encoder_attn.v_proj.bias", "target_letter_decoder.layers.7.encoder_attn.q_proj.weight", "target_letter_decoder.layers.7.encoder_attn.q_proj.bias", "target_letter_decoder.layers.7.encoder_attn.out_proj.weight", "target_letter_decoder.layers.7.encoder_attn.out_proj.bias", "target_letter_decoder.layers.7.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.7.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.7.fc1.weight", "target_letter_decoder.layers.7.fc1.bias", "target_letter_decoder.layers.7.fc2.weight", "target_letter_decoder.layers.7.fc2.bias", "target_letter_decoder.layers.7.final_layer_norm.weight", "target_letter_decoder.layers.7.final_layer_norm.bias", "target_letter_decoder.layers.8.self_attn.k_proj.weight", "target_letter_decoder.layers.8.self_attn.k_proj.bias", "target_letter_decoder.layers.8.self_attn.v_proj.weight", "target_letter_decoder.layers.8.self_attn.v_proj.bias", "target_letter_decoder.layers.8.self_attn.q_proj.weight", "target_letter_decoder.layers.8.self_attn.q_proj.bias", "target_letter_decoder.layers.8.self_attn.out_proj.weight", "target_letter_decoder.layers.8.self_attn.out_proj.bias", "target_letter_decoder.layers.8.self_attn_layer_norm.weight", "target_letter_decoder.layers.8.self_attn_layer_norm.bias", "target_letter_decoder.layers.8.encoder_attn.k_proj.weight", "target_letter_decoder.layers.8.encoder_attn.k_proj.bias", "target_letter_decoder.layers.8.encoder_attn.v_proj.weight", "target_letter_decoder.layers.8.encoder_attn.v_proj.bias", "target_letter_decoder.layers.8.encoder_attn.q_proj.weight", "target_letter_decoder.layers.8.encoder_attn.q_proj.bias", "target_letter_decoder.layers.8.encoder_attn.out_proj.weight", "target_letter_decoder.layers.8.encoder_attn.out_proj.bias", "target_letter_decoder.layers.8.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.8.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.8.fc1.weight", "target_letter_decoder.layers.8.fc1.bias", "target_letter_decoder.layers.8.fc2.weight", "target_letter_decoder.layers.8.fc2.bias", "target_letter_decoder.layers.8.final_layer_norm.weight", "target_letter_decoder.layers.8.final_layer_norm.bias", "target_letter_decoder.layers.9.self_attn.k_proj.weight", "target_letter_decoder.layers.9.self_attn.k_proj.bias", "target_letter_decoder.layers.9.self_attn.v_proj.weight", "target_letter_decoder.layers.9.self_attn.v_proj.bias", "target_letter_decoder.layers.9.self_attn.q_proj.weight", "target_letter_decoder.layers.9.self_attn.q_proj.bias", "target_letter_decoder.layers.9.self_attn.out_proj.weight", "target_letter_decoder.layers.9.self_attn.out_proj.bias", "target_letter_decoder.layers.9.self_attn_layer_norm.weight", "target_letter_decoder.layers.9.self_attn_layer_norm.bias", "target_letter_decoder.layers.9.encoder_attn.k_proj.weight", "target_letter_decoder.layers.9.encoder_attn.k_proj.bias", "target_letter_decoder.layers.9.encoder_attn.v_proj.weight", "target_letter_decoder.layers.9.encoder_attn.v_proj.bias", "target_letter_decoder.layers.9.encoder_attn.q_proj.weight", "target_letter_decoder.layers.9.encoder_attn.q_proj.bias", "target_letter_decoder.layers.9.encoder_attn.out_proj.weight", "target_letter_decoder.layers.9.encoder_attn.out_proj.bias", "target_letter_decoder.layers.9.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.9.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.9.fc1.weight", "target_letter_decoder.layers.9.fc1.bias", "target_letter_decoder.layers.9.fc2.weight", "target_letter_decoder.layers.9.fc2.bias", "target_letter_decoder.layers.9.final_layer_norm.weight", "target_letter_decoder.layers.9.final_layer_norm.bias", "target_letter_decoder.layers.10.self_attn.k_proj.weight", "target_letter_decoder.layers.10.self_attn.k_proj.bias", "target_letter_decoder.layers.10.self_attn.v_proj.weight", "target_letter_decoder.layers.10.self_attn.v_proj.bias", "target_letter_decoder.layers.10.self_attn.q_proj.weight", "target_letter_decoder.layers.10.self_attn.q_proj.bias", "target_letter_decoder.layers.10.self_attn.out_proj.weight", "target_letter_decoder.layers.10.self_attn.out_proj.bias", "target_letter_decoder.layers.10.self_attn_layer_norm.weight", "target_letter_decoder.layers.10.self_attn_layer_norm.bias", "target_letter_decoder.layers.10.encoder_attn.k_proj.weight", "target_letter_decoder.layers.10.encoder_attn.k_proj.bias", "target_letter_decoder.layers.10.encoder_attn.v_proj.weight", "target_letter_decoder.layers.10.encoder_attn.v_proj.bias", "target_letter_decoder.layers.10.encoder_attn.q_proj.weight", "target_letter_decoder.layers.10.encoder_attn.q_proj.bias", "target_letter_decoder.layers.10.encoder_attn.out_proj.weight", "target_letter_decoder.layers.10.encoder_attn.out_proj.bias", "target_letter_decoder.layers.10.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.10.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.10.fc1.weight", "target_letter_decoder.layers.10.fc1.bias", "target_letter_decoder.layers.10.fc2.weight", "target_letter_decoder.layers.10.fc2.bias", "target_letter_decoder.layers.10.final_layer_norm.weight", "target_letter_decoder.layers.10.final_layer_norm.bias", "target_letter_decoder.layers.11.self_attn.k_proj.weight", "target_letter_decoder.layers.11.self_attn.k_proj.bias", "target_letter_decoder.layers.11.self_attn.v_proj.weight", "target_letter_decoder.layers.11.self_attn.v_proj.bias", "target_letter_decoder.layers.11.self_attn.q_proj.weight", "target_letter_decoder.layers.11.self_attn.q_proj.bias", "target_letter_decoder.layers.11.self_attn.out_proj.weight", "target_letter_decoder.layers.11.self_attn.out_proj.bias", "target_letter_decoder.layers.11.self_attn_layer_norm.weight", "target_letter_decoder.layers.11.self_attn_layer_norm.bias", "target_letter_decoder.layers.11.encoder_attn.k_proj.weight", "target_letter_decoder.layers.11.encoder_attn.k_proj.bias", "target_letter_decoder.layers.11.encoder_attn.v_proj.weight", "target_letter_decoder.layers.11.encoder_attn.v_proj.bias", "target_letter_decoder.layers.11.encoder_attn.q_proj.weight", "target_letter_decoder.layers.11.encoder_attn.q_proj.bias", "target_letter_decoder.layers.11.encoder_attn.out_proj.weight", "target_letter_decoder.layers.11.encoder_attn.out_proj.bias", "target_letter_decoder.layers.11.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.11.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.11.fc1.weight", "target_letter_decoder.layers.11.fc1.bias", "target_letter_decoder.layers.11.fc2.weight", "target_letter_decoder.layers.11.fc2.bias", "target_letter_decoder.layers.11.final_layer_norm.weight", "target_letter_decoder.layers.11.final_layer_norm.bias", "target_letter_decoder.layer_norm.weight", "target_letter_decoder.layer_norm.bias", "target_letter_decoder.output_projection.weight", "synthesizer_encoder.layers.0.self_attn.k_proj.weight", "synthesizer_encoder.layers.0.self_attn.k_proj.bias", "synthesizer_encoder.layers.0.self_attn.v_proj.weight", "synthesizer_encoder.layers.0.self_attn.v_proj.bias", "synthesizer_encoder.layers.0.self_attn.q_proj.weight", "synthesizer_encoder.layers.0.self_attn.q_proj.bias", "synthesizer_encoder.layers.0.self_attn.out_proj.weight", "synthesizer_encoder.layers.0.self_attn.out_proj.bias", "synthesizer_encoder.layers.0.self_attn_layer_norm.weight", "synthesizer_encoder.layers.0.self_attn_layer_norm.bias", "synthesizer_encoder.layers.0.fc1.weight", "synthesizer_encoder.layers.0.fc1.bias", "synthesizer_encoder.layers.0.fc2.weight", "synthesizer_encoder.layers.0.fc2.bias", "synthesizer_encoder.layers.0.final_layer_norm.weight", "synthesizer_encoder.layers.0.final_layer_norm.bias", "synthesizer_encoder.layers.1.self_attn.k_proj.weight", "synthesizer_encoder.layers.1.self_attn.k_proj.bias", "synthesizer_encoder.layers.1.self_attn.v_proj.weight", "synthesizer_encoder.layers.1.self_attn.v_proj.bias", "synthesizer_encoder.layers.1.self_attn.q_proj.weight", "synthesizer_encoder.layers.1.self_attn.q_proj.bias", "synthesizer_encoder.layers.1.self_attn.out_proj.weight", "synthesizer_encoder.layers.1.self_attn.out_proj.bias", "synthesizer_encoder.layers.1.self_attn_layer_norm.weight", "synthesizer_encoder.layers.1.self_attn_layer_norm.bias", "synthesizer_encoder.layers.1.fc1.weight", "synthesizer_encoder.layers.1.fc1.bias", "synthesizer_encoder.layers.1.fc2.weight", "synthesizer_encoder.layers.1.fc2.bias", "synthesizer_encoder.layers.1.final_layer_norm.weight", "synthesizer_encoder.layers.1.final_layer_norm.bias", "synthesizer_encoder.layers.2.self_attn.k_proj.weight", "synthesizer_encoder.layers.2.self_attn.k_proj.bias", "synthesizer_encoder.layers.2.self_attn.v_proj.weight", "synthesizer_encoder.layers.2.self_attn.v_proj.bias", "synthesizer_encoder.layers.2.self_attn.q_proj.weight", "synthesizer_encoder.layers.2.self_attn.q_proj.bias", "synthesizer_encoder.layers.2.self_attn.out_proj.weight", "synthesizer_encoder.layers.2.self_attn.out_proj.bias", "synthesizer_encoder.layers.2.self_attn_layer_norm.weight", "synthesizer_encoder.layers.2.self_attn_layer_norm.bias", "synthesizer_encoder.layers.2.fc1.weight", "synthesizer_encoder.layers.2.fc1.bias", "synthesizer_encoder.layers.2.fc2.weight", "synthesizer_encoder.layers.2.fc2.bias", "synthesizer_encoder.layers.2.final_layer_norm.weight", "synthesizer_encoder.layers.2.final_layer_norm.bias", "synthesizer_encoder.layers.3.self_attn.k_proj.weight", "synthesizer_encoder.layers.3.self_attn.k_proj.bias", "synthesizer_encoder.layers.3.self_attn.v_proj.weight", "synthesizer_encoder.layers.3.self_attn.v_proj.bias", "synthesizer_encoder.layers.3.self_attn.q_proj.weight", "synthesizer_encoder.layers.3.self_attn.q_proj.bias", "synthesizer_encoder.layers.3.self_attn.out_proj.weight", "synthesizer_encoder.layers.3.self_attn.out_proj.bias", "synthesizer_encoder.layers.3.self_attn_layer_norm.weight", "synthesizer_encoder.layers.3.self_attn_layer_norm.bias", "synthesizer_encoder.layers.3.fc1.weight", "synthesizer_encoder.layers.3.fc1.bias", "synthesizer_encoder.layers.3.fc2.weight", "synthesizer_encoder.layers.3.fc2.bias", "synthesizer_encoder.layers.3.final_layer_norm.weight", "synthesizer_encoder.layers.3.final_layer_norm.bias", "synthesizer_encoder.layer_norm.weight", "synthesizer_encoder.layer_norm.bias". 
        size mismatch for text_decoder_frontend.embed.weight: copying a param with shape torch.Size([10082, 1024]) from checkpoint, the shape in current model is torch.Size([256206, 1024]).
        size mismatch for text_decoder.layers.0.ffn.inner_proj.weight: copying a param with shape torch.Size([8192, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for text_decoder.layers.0.ffn.inner_proj.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for text_decoder.layers.0.ffn.output_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for text_decoder.layers.1.ffn.inner_proj.weight: copying a param with shape torch.Size([8192, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for text_decoder.layers.1.ffn.inner_proj.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for text_decoder.layers.1.ffn.output_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for text_decoder.layers.2.ffn.inner_proj.weight: copying a param with shape torch.Size([8192, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for text_decoder.layers.2.ffn.inner_proj.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for text_decoder.layers.2.ffn.output_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for text_decoder.layers.3.ffn.inner_proj.weight: copying a param with shape torch.Size([8192, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for text_decoder.layers.3.ffn.inner_proj.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for text_decoder.layers.3.ffn.output_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for final_proj.weight: copying a param with shape torch.Size([10082, 1024]) from checkpoint, the shape in current model is torch.Size([256206, 1024]).

YJYJLee avatar Jan 11 '24 14:01 YJYJLee

cc @kauterry

cbalioglu avatar Jan 11 '24 18:01 cbalioglu

Hi @YJYJLee, could you provide the command you used to encounter this error?

I checked out the main branch of the repo: https://github.com/facebookresearch/seamless_communication and ran this command:

m4t_predict /large_experiments/seamless/ust/data/TTS/vocoder_training/audio_wavs/multi_spkr/eng/eng_LJSpeech-1.1_0/LJ003-0001.wav --task asr --tgt_lang eng --model_name seamlessM4T_medium

And things go as expected without the errors you report: Downloading the checkpoint of seamlessM4T_medium... 100%|██████████████████████████████████████████████████████████| 6.37G/6.37G [02:51<00:00, 39.8MB/s] Using the cached tokenizer of seamlessM4T_medium. Set force to True to download again. /private/home/krs/miniconda3/envs/fairseq2/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") 2024-01-12 00:24:59,212 INFO -- seamless_communication.cli.m4t.predict.predict: text_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(1, 200), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0) 2024-01-12 00:24:59,212 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(25, 50), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0) 2024-01-12 00:24:59,212 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_ngram_filtering=False 2024-01-12 00:24:59,225 WARNING -- seamless_communication.inference.translator: Transposing audio tensor from (bsz, seq_len) -> (seq_len, bsz). 2024-01-12 00:25:00,078 INFO -- seamless_communication.cli.m4t.predict.predict: Translated text in eng: The Chronicles of Newgate, volume two, by Arthur Griffiths, section five, Newgate, down to 1818, part two.

kauterry avatar Jan 12 '24 00:01 kauterry

@kauterry Hi!

I used below command for evaluation

m4t_evaluate --data_file /fsx-ust/krs/datasets/S2ST/fleurs_eval/dev_fleurs_eng-spa.tsv --task s2tt --tgt_lang spa --output_path /fsx-checkpoints/yejinlee/debug --audio_root_dir /fsx-ust/data/audio_zips --ref_field _tgt_text --batch_size 32

YJYJLee avatar Jan 12 '24 00:01 YJYJLee

m4t_evaluate also works for me with the medium model.

kauterry avatar Jan 12 '24 00:01 kauterry

Are you sure you're using the latest version of main?

kauterry avatar Jan 12 '24 00:01 kauterry

In fact, could you try my m4t_predict command above and see if it works for you?

kauterry avatar Jan 12 '24 00:01 kauterry

@kauterry Yes, I am using the latest version, and it's weird that m4t_predict works fine for seamlessM4T_medium. It seems m4t_evaluate has problem for seamlessM4T_medium.

Also, the m4t_evaluate command I provided above uses seamlessM4T-large-v2, my mistake. But I still get error for m4t_evaluate. Could you try again the below command for me?

m4t_evaluate --data_file /fsx-ust/krs/datasets/S2ST/fleurs_eval/dev_fleurs_eng-spa.tsv --task s2tt --tgt_lang spa --output_path /fsx-checkpoints/yejinlee/debug --audio_root_dir /fsx-ust/data/audio_zips --ref_field _tgt_text --batch_size 32 --model_name seamlessM4T_medium

YJYJLee avatar Jan 12 '24 15:01 YJYJLee

I have a similar issue. In my case, I ran a command m4t_evaluate --data_file .../covost_v2.en_de.test.tsv --task S2TT --tgt_lang deu --output_path .../tmp.csv --ref_field translation --audio_root_dir {audio_root_dir} --model_name seamlessM4T_large

but it prints:

  File ".../bin/m4t_evaluate", line 8, in <module>
    sys.exit(main())
  File ".../lib/python3.8/site-packages/seamless_communication/cli/m4t/evaluate/evaluate.py", line 396, in main
    translator = Translator(
  File ".../lib/python3.8/site-packages/seamless_communication/inference/translator.py", line 113, in __init__
    self.model = load_unity_model(model_name_or_card, device=device, dtype=dtype)
  File ".../lib/python3.8/site-packages/fairseq2/models/utils/generic_loaders.py", line 285, in __call__
    model.load_state_dict(state_dict)
  File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UnitYModel:
        Missing key(s) in state_dict: "text_decoder.layers.6.self_attn_layer_norm.weight", "text_decoder.layers.6.self_attn_layer_norm.bias", "text_decoder.layers.6.self_attn.q_proj.weight", "text_decoder.layers.6.self_attn.q_proj.bias", "text_decoder.layers.6.self_attn.k_proj.weight", "text_decoder.layers.6.self_attn.k_proj.bias", "text_decoder.layers.6.self_attn.v_proj.weight", "text_decoder.layers.6.self_attn.v_proj.bias", "text_decoder.layers.6.self_attn.output_proj.weight", "text_decoder.layers.6.self_attn.output_proj.bias", "text_decoder.layers.6.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.6.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.6.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.6.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.6.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.6.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.6.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.6.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.6.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.6.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.6.ffn_layer_norm.weight", "text_decoder.layers.6.ffn_layer_norm.bias", "text_decoder.layers.6.ffn.inner_proj.weight", "text_decoder.layers.6.ffn.inner_proj.bias", "text_decoder.layers.6.ffn.output_proj.weight", "text_decoder.layers.6.ffn.output_proj.bias", "text_decoder.layers.7.self_attn_layer_norm.weight", "text_decoder.layers.7.self_attn_layer_norm.bias", "text_decoder.layers.7.self_attn.q_proj.weight", "text_decoder.layers.7.self_attn.q_proj.bias", "text_decoder.layers.7.self_attn.k_proj.weight", "text_decoder.layers.7.self_attn.k_proj.bias", "text_decoder.layers.7.self_attn.v_proj.weight", "text_decoder.layers.7.self_attn.v_proj.bias", "text_decoder.layers.7.self_attn.output_proj.weight", "text_decoder.layers.7.self_attn.output_proj.bias", "text_decoder.layers.7.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.7.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.7.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.7.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.7.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.7.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.7.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.7.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.7.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.7.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.7.ffn_layer_norm.weight", "text_decoder.layers.7.ffn_layer_norm.bias", "text_decoder.layers.7.ffn.inner_proj.weight", "text_decoder.layers.7.ffn.inner_proj.bias", "text_decoder.layers.7.ffn.output_proj.weight", "text_decoder.layers.7.ffn.output_proj.bias", "text_decoder.layers.8.self_attn_layer_norm.weight", "text_decoder.layers.8.self_attn_layer_norm.bias", "text_decoder.layers.8.self_attn.q_proj.weight", "text_decoder.layers.8.self_attn.q_proj.bias", "text_decoder.layers.8.self_attn.k_proj.weight", "text_decoder.layers.8.self_attn.k_proj.bias", "text_decoder.layers.8.self_attn.v_proj.weight", "text_decoder.layers.8.self_attn.v_proj.bias", "text_decoder.layers.8.self_attn.output_proj.weight", "text_decoder.layers.8.self_attn.output_proj.bias", "text_decoder.layers.8.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.8.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.8.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.8.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.8.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.8.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.8.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.8.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.8.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.8.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.8.ffn_layer_norm.weight", "text_decoder.layers.8.ffn_layer_norm.bias", "text_decoder.layers.8.ffn.inner_proj.weight", "text_decoder.layers.8.ffn.inner_proj.bias", "text_decoder.layers.8.ffn.output_proj.weight", "text_decoder.layers.8.ffn.output_proj.bias", "text_decoder.layers.9.self_attn_layer_norm.weight", "text_decoder.layers.9.self_attn_layer_norm.bias", "text_decoder.layers.9.self_attn.q_proj.weight", "text_decoder.layers.9.self_attn.q_proj.bias", "text_decoder.layers.9.self_attn.k_proj.weight", "text_decoder.layers.9.self_attn.k_proj.bias", "text_decoder.layers.9.self_attn.v_proj.weight", "text_decoder.layers.9.self_attn.v_proj.bias", "text_decoder.layers.9.self_attn.output_proj.weight", "text_decoder.layers.9.self_attn.output_proj.bias", "text_decoder.layers.9.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.9.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.9.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.9.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.9.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.9.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.9.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.9.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.9.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.9.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.9.ffn_layer_norm.weight", "text_decoder.layers.9.ffn_layer_norm.bias", "text_decoder.layers.9.ffn.inner_proj.weight", "text_decoder.layers.9.ffn.inner_proj.bias", "text_decoder.layers.9.ffn.output_proj.weight", "text_decoder.layers.9.ffn.output_proj.bias", "text_decoder.layers.10.self_attn_layer_norm.weight", "text_decoder.layers.10.self_attn_layer_norm.bias", "text_decoder.layers.10.self_attn.q_proj.weight", "text_decoder.layers.10.self_attn.q_proj.bias", "text_decoder.layers.10.self_attn.k_proj.weight", "text_decoder.layers.10.self_attn.k_proj.bias", "text_decoder.layers.10.self_attn.v_proj.weight", "text_decoder.layers.10.self_attn.v_proj.bias", "text_decoder.layers.10.self_attn.output_proj.weight", "text_decoder.layers.10.self_attn.output_proj.bias", "text_decoder.layers.10.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.10.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.10.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.10.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.10.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.10.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.10.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.10.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.10.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.10.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.10.ffn_layer_norm.weight", "text_decoder.layers.10.ffn_layer_norm.bias", "text_decoder.layers.10.ffn.inner_proj.weight", "text_decoder.layers.10.ffn.inner_proj.bias", "text_decoder.layers.10.ffn.output_proj.weight", "text_decoder.layers.10.ffn.output_proj.bias", "text_decoder.layers.11.self_attn_layer_norm.weight", "text_decoder.layers.11.self_attn_layer_norm.bias", "text_decoder.layers.11.self_attn.q_proj.weight", "text_decoder.layers.11.self_attn.q_proj.bias", "text_decoder.layers.11.self_attn.k_proj.weight", "text_decoder.layers.11.self_attn.k_proj.bias", "text_decoder.layers.11.self_attn.v_proj.weight", "text_decoder.layers.11.self_attn.v_proj.bias", "text_decoder.layers.11.self_attn.output_proj.weight", "text_decoder.layers.11.self_attn.output_proj.bias", "text_decoder.layers.11.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.11.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.11.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.11.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.11.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.11.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.11.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.11.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.11.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.11.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.11.ffn_layer_norm.weight", "text_decoder.layers.11.ffn_layer_norm.bias", "text_decoder.layers.11.ffn.inner_proj.weight", "text_decoder.layers.11.ffn.inner_proj.bias", "text_decoder.layers.11.ffn.output_proj.weight", "text_decoder.layers.11.ffn.output_proj.bias", "text_decoder.layers.12.self_attn_layer_norm.weight", "text_decoder.layers.12.self_attn_layer_norm.bias", "text_decoder.layers.12.self_attn.q_proj.weight", "text_decoder.layers.12.self_attn.q_proj.bias", "text_decoder.layers.12.self_attn.k_proj.weight", "text_decoder.layers.12.self_attn.k_proj.bias", "text_decoder.layers.12.self_attn.v_proj.weight", "text_decoder.layers.12.self_attn.v_proj.bias", "text_decoder.layers.12.self_attn.output_proj.weight", "text_decoder.layers.12.self_attn.output_proj.bias", "text_decoder.layers.12.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.12.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.12.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.12.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.12.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.12.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.12.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.12.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.12.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.12.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.12.ffn_layer_norm.weight", "text_decoder.layers.12.ffn_layer_norm.bias", "text_decoder.layers.12.ffn.inner_proj.weight", "text_decoder.layers.12.ffn.inner_proj.bias", "text_decoder.layers.12.ffn.output_proj.weight", "text_decoder.layers.12.ffn.output_proj.bias", "text_decoder.layers.13.self_attn_layer_norm.weight", "text_decoder.layers.13.self_attn_layer_norm.bias", "text_decoder.layers.13.self_attn.q_proj.weight", "text_decoder.layers.13.self_attn.q_proj.bias", "text_decoder.layers.13.self_attn.k_proj.weight", "text_decoder.layers.13.self_attn.k_proj.bias", "text_decoder.layers.13.self_attn.v_proj.weight", "text_decoder.layers.13.self_attn.v_proj.bias", "text_decoder.layers.13.self_attn.output_proj.weight", "text_decoder.layers.13.self_attn.output_proj.bias", "text_decoder.layers.13.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.13.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.13.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.13.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.13.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.13.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.13.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.13.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.13.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.13.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.13.ffn_layer_norm.weight", "text_decoder.layers.13.ffn_layer_norm.bias", "text_decoder.layers.13.ffn.inner_proj.weight", "text_decoder.layers.13.ffn.inner_proj.bias", "text_decoder.layers.13.ffn.output_proj.weight", "text_decoder.layers.13.ffn.output_proj.bias", "text_decoder.layers.14.self_attn_layer_norm.weight", "text_decoder.layers.14.self_attn_layer_norm.bias", "text_decoder.layers.14.self_attn.q_proj.weight", "text_decoder.layers.14.self_attn.q_proj.bias", "text_decoder.layers.14.self_attn.k_proj.weight", "text_decoder.layers.14.self_attn.k_proj.bias", "text_decoder.layers.14.self_attn.v_proj.weight", "text_decoder.layers.14.self_attn.v_proj.bias", "text_decoder.layers.14.self_attn.output_proj.weight", "text_decoder.layers.14.self_attn.output_proj.bias", "text_decoder.layers.14.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.14.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.14.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.14.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.14.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.14.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.14.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.14.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.14.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.14.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.14.ffn_layer_norm.weight", "text_decoder.layers.14.ffn_layer_norm.bias", "text_decoder.layers.14.ffn.inner_proj.weight", "text_decoder.layers.14.ffn.inner_proj.bias", "text_decoder.layers.14.ffn.output_proj.weight", "text_decoder.layers.14.ffn.output_proj.bias", "text_decoder.layers.15.self_attn_layer_norm.weight", "text_decoder.layers.15.self_attn_layer_norm.bias", "text_decoder.layers.15.self_attn.q_proj.weight", "text_decoder.layers.15.self_attn.q_proj.bias", "text_decoder.layers.15.self_attn.k_proj.weight", "text_decoder.layers.15.self_attn.k_proj.bias", "text_decoder.layers.15.self_attn.v_proj.weight", "text_decoder.layers.15.self_attn.v_proj.bias", "text_decoder.layers.15.self_attn.output_proj.weight", "text_decoder.layers.15.self_attn.output_proj.bias", "text_decoder.layers.15.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.15.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.15.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.15.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.15.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.15.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.15.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.15.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.15.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.15.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.15.ffn_layer_norm.weight", "text_decoder.layers.15.ffn_layer_norm.bias", "text_decoder.layers.15.ffn.inner_proj.weight", "text_decoder.layers.15.ffn.inner_proj.bias", "text_decoder.layers.15.ffn.output_proj.weight", "text_decoder.layers.15.ffn.output_proj.bias", "text_decoder.layers.16.self_attn_layer_norm.weight", "text_decoder.layers.16.self_attn_layer_norm.bias", "text_decoder.layers.16.self_attn.q_proj.weight", "text_decoder.layers.16.self_attn.q_proj.bias", "text_decoder.layers.16.self_attn.k_proj.weight", "text_decoder.layers.16.self_attn.k_proj.bias", "text_decoder.layers.16.self_attn.v_proj.weight", "text_decoder.layers.16.self_attn.v_proj.bias", "text_decoder.layers.16.self_attn.output_proj.weight", "text_decoder.layers.16.self_attn.output_proj.bias", "text_decoder.layers.16.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.16.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.16.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.16.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.16.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.16.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.16.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.16.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.16.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.16.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.16.ffn_layer_norm.weight", "text_decoder.layers.16.ffn_layer_norm.bias", "text_decoder.layers.16.ffn.inner_proj.weight", "text_decoder.layers.16.ffn.inner_proj.bias", "text_decoder.layers.16.ffn.output_proj.weight", "text_decoder.layers.16.ffn.output_proj.bias", "text_decoder.layers.17.self_attn_layer_norm.weight", "text_decoder.layers.17.self_attn_layer_norm.bias", "text_decoder.layers.17.self_attn.q_proj.weight", "text_decoder.layers.17.self_attn.q_proj.bias", "text_decoder.layers.17.self_attn.k_proj.weight", "text_decoder.layers.17.self_attn.k_proj.bias", "text_decoder.layers.17.self_attn.v_proj.weight", "text_decoder.layers.17.self_attn.v_proj.bias", "text_decoder.layers.17.self_attn.output_proj.weight", "text_decoder.layers.17.self_attn.output_proj.bias", "text_decoder.layers.17.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.17.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.17.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.17.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.17.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.17.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.17.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.17.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.17.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.17.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.17.ffn_layer_norm.weight", "text_decoder.layers.17.ffn_layer_norm.bias", "text_decoder.layers.17.ffn.inner_proj.weight", "text_decoder.layers.17.ffn.inner_proj.bias", "text_decoder.layers.17.ffn.output_proj.weight", "text_decoder.layers.17.ffn.output_proj.bias", "text_decoder.layers.18.self_attn_layer_norm.weight", "text_decoder.layers.18.self_attn_layer_norm.bias", "text_decoder.layers.18.self_attn.q_proj.weight", "text_decoder.layers.18.self_attn.q_proj.bias", "text_decoder.layers.18.self_attn.k_proj.weight", "text_decoder.layers.18.self_attn.k_proj.bias", "text_decoder.layers.18.self_attn.v_proj.weight", "text_decoder.layers.18.self_attn.v_proj.bias", "text_decoder.layers.18.self_attn.output_proj.weight", "text_decoder.layers.18.self_attn.output_proj.bias", "text_decoder.layers.18.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.18.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.18.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.18.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.18.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.18.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.18.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.18.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.18.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.18.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.18.ffn_layer_norm.weight", "text_decoder.layers.18.ffn_layer_norm.bias", "text_decoder.layers.18.ffn.inner_proj.weight", "text_decoder.layers.18.ffn.inner_proj.bias", "text_decoder.layers.18.ffn.output_proj.weight", "text_decoder.layers.18.ffn.output_proj.bias", "text_decoder.layers.19.self_attn_layer_norm.weight", "text_decoder.layers.19.self_attn_layer_norm.bias", "text_decoder.layers.19.self_attn.q_proj.weight", "text_decoder.layers.19.self_attn.q_proj.bias", "text_decoder.layers.19.self_attn.k_proj.weight", "text_decoder.layers.19.self_attn.k_proj.bias", "text_decoder.layers.19.self_attn.v_proj.weight", "text_decoder.layers.19.self_attn.v_proj.bias", "text_decoder.layers.19.self_attn.output_proj.weight", "text_decoder.layers.19.self_attn.output_proj.bias", "text_decoder.layers.19.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.19.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.19.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.19.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.19.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.19.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.19.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.19.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.19.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.19.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.19.ffn_layer_norm.weight", "text_decoder.layers.19.ffn_layer_norm.bias", "text_decoder.layers.19.ffn.inner_proj.weight", "text_decoder.layers.19.ffn.inner_proj.bias", "text_decoder.layers.19.ffn.output_proj.weight", "text_decoder.layers.19.ffn.output_proj.bias", "text_decoder.layers.20.self_attn_layer_norm.weight", "text_decoder.layers.20.self_attn_layer_norm.bias", "text_decoder.layers.20.self_attn.q_proj.weight", "text_decoder.layers.20.self_attn.q_proj.bias", "text_decoder.layers.20.self_attn.k_proj.weight", "text_decoder.layers.20.self_attn.k_proj.bias", "text_decoder.layers.20.self_attn.v_proj.weight", "text_decoder.layers.20.self_attn.v_proj.bias", "text_decoder.layers.20.self_attn.output_proj.weight", "text_decoder.layers.20.self_attn.output_proj.bias", "text_decoder.layers.20.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.20.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.20.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.20.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.20.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.20.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.20.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.20.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.20.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.20.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.20.ffn_layer_norm.weight", "text_decoder.layers.20.ffn_layer_norm.bias", "text_decoder.layers.20.ffn.inner_proj.weight", "text_decoder.layers.20.ffn.inner_proj.bias", "text_decoder.layers.20.ffn.output_proj.weight", "text_decoder.layers.20.ffn.output_proj.bias", "text_decoder.layers.21.self_attn_layer_norm.weight", "text_decoder.layers.21.self_attn_layer_norm.bias", "text_decoder.layers.21.self_attn.q_proj.weight", "text_decoder.layers.21.self_attn.q_proj.bias", "text_decoder.layers.21.self_attn.k_proj.weight", "text_decoder.layers.21.self_attn.k_proj.bias", "text_decoder.layers.21.self_attn.v_proj.weight", "text_decoder.layers.21.self_attn.v_proj.bias", "text_decoder.layers.21.self_attn.output_proj.weight", "text_decoder.layers.21.self_attn.output_proj.bias", "text_decoder.layers.21.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.21.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.21.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.21.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.21.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.21.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.21.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.21.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.21.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.21.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.21.ffn_layer_norm.weight", "text_decoder.layers.21.ffn_layer_norm.bias", "text_decoder.layers.21.ffn.inner_proj.weight", "text_decoder.layers.21.ffn.inner_proj.bias", "text_decoder.layers.21.ffn.output_proj.weight", "text_decoder.layers.21.ffn.output_proj.bias", "text_decoder.layers.22.self_attn_layer_norm.weight", "text_decoder.layers.22.self_attn_layer_norm.bias", "text_decoder.layers.22.self_attn.q_proj.weight", "text_decoder.layers.22.self_attn.q_proj.bias", "text_decoder.layers.22.self_attn.k_proj.weight", "text_decoder.layers.22.self_attn.k_proj.bias", "text_decoder.layers.22.self_attn.v_proj.weight", "text_decoder.layers.22.self_attn.v_proj.bias", "text_decoder.layers.22.self_attn.output_proj.weight", "text_decoder.layers.22.self_attn.output_proj.bias", "text_decoder.layers.22.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.22.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.22.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.22.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.22.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.22.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.22.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.22.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.22.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.22.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.22.ffn_layer_norm.weight", "text_decoder.layers.22.ffn_layer_norm.bias", "text_decoder.layers.22.ffn.inner_proj.weight", "text_decoder.layers.22.ffn.inner_proj.bias", "text_decoder.layers.22.ffn.output_proj.weight", "text_decoder.layers.22.ffn.output_proj.bias", "text_decoder.layers.23.self_attn_layer_norm.weight", "text_decoder.layers.23.self_attn_layer_norm.bias", "text_decoder.layers.23.self_attn.q_proj.weight", "text_decoder.layers.23.self_attn.q_proj.bias", "text_decoder.layers.23.self_attn.k_proj.weight", "text_decoder.layers.23.self_attn.k_proj.bias", "text_decoder.layers.23.self_attn.v_proj.weight", "text_decoder.layers.23.self_attn.v_proj.bias", "text_decoder.layers.23.self_attn.output_proj.weight", "text_decoder.layers.23.self_attn.output_proj.bias", "text_decoder.layers.23.encoder_decoder_attn_layer_norm.weight", "text_decoder.layers.23.encoder_decoder_attn_layer_norm.bias", "text_decoder.layers.23.encoder_decoder_attn.q_proj.weight", "text_decoder.layers.23.encoder_decoder_attn.q_proj.bias", "text_decoder.layers.23.encoder_decoder_attn.k_proj.weight", "text_decoder.layers.23.encoder_decoder_attn.k_proj.bias", "text_decoder.layers.23.encoder_decoder_attn.v_proj.weight", "text_decoder.layers.23.encoder_decoder_attn.v_proj.bias", "text_decoder.layers.23.encoder_decoder_attn.output_proj.weight", "text_decoder.layers.23.encoder_decoder_attn.output_proj.bias", "text_decoder.layers.23.ffn_layer_norm.weight", "text_decoder.layers.23.ffn_layer_norm.bias", "text_decoder.layers.23.ffn.inner_proj.weight", "text_decoder.layers.23.ffn.inner_proj.bias", "text_decoder.layers.23.ffn.output_proj.weight", "text_decoder.layers.23.ffn.output_proj.bias". 
        Unexpected key(s) in state_dict: "target_letter_decoder.version", "target_letter_decoder.embed_tokens.weight", "target_letter_decoder.embed_positions._float_tensor", "target_letter_decoder.layers.0.self_attn.k_proj.weight", "target_letter_decoder.layers.0.self_attn.k_proj.bias", "target_letter_decoder.layers.0.self_attn.v_proj.weight", "target_letter_decoder.layers.0.self_attn.v_proj.bias", "target_letter_decoder.layers.0.self_attn.q_proj.weight", "target_letter_decoder.layers.0.self_attn.q_proj.bias", "target_letter_decoder.layers.0.self_attn.out_proj.weight", "target_letter_decoder.layers.0.self_attn.out_proj.bias", "target_letter_decoder.layers.0.self_attn_layer_norm.weight", "target_letter_decoder.layers.0.self_attn_layer_norm.bias", "target_letter_decoder.layers.0.encoder_attn.k_proj.weight", "target_letter_decoder.layers.0.encoder_attn.k_proj.bias", "target_letter_decoder.layers.0.encoder_attn.v_proj.weight", "target_letter_decoder.layers.0.encoder_attn.v_proj.bias", "target_letter_decoder.layers.0.encoder_attn.q_proj.weight", "target_letter_decoder.layers.0.encoder_attn.q_proj.bias", "target_letter_decoder.layers.0.encoder_attn.out_proj.weight", "target_letter_decoder.layers.0.encoder_attn.out_proj.bias", "target_letter_decoder.layers.0.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.0.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.0.fc1.weight", "target_letter_decoder.layers.0.fc1.bias", "target_letter_decoder.layers.0.fc2.weight", "target_letter_decoder.layers.0.fc2.bias", "target_letter_decoder.layers.0.final_layer_norm.weight", "target_letter_decoder.layers.0.final_layer_norm.bias", "target_letter_decoder.layers.1.self_attn.k_proj.weight", "target_letter_decoder.layers.1.self_attn.k_proj.bias", "target_letter_decoder.layers.1.self_attn.v_proj.weight", "target_letter_decoder.layers.1.self_attn.v_proj.bias", "target_letter_decoder.layers.1.self_attn.q_proj.weight", "target_letter_decoder.layers.1.self_attn.q_proj.bias", "target_letter_decoder.layers.1.self_attn.out_proj.weight", "target_letter_decoder.layers.1.self_attn.out_proj.bias", "target_letter_decoder.layers.1.self_attn_layer_norm.weight", "target_letter_decoder.layers.1.self_attn_layer_norm.bias", "target_letter_decoder.layers.1.encoder_attn.k_proj.weight", "target_letter_decoder.layers.1.encoder_attn.k_proj.bias", "target_letter_decoder.layers.1.encoder_attn.v_proj.weight", "target_letter_decoder.layers.1.encoder_attn.v_proj.bias", "target_letter_decoder.layers.1.encoder_attn.q_proj.weight", "target_letter_decoder.layers.1.encoder_attn.q_proj.bias", "target_letter_decoder.layers.1.encoder_attn.out_proj.weight", "target_letter_decoder.layers.1.encoder_attn.out_proj.bias", "target_letter_decoder.layers.1.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.1.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.1.fc1.weight", "target_letter_decoder.layers.1.fc1.bias", "target_letter_decoder.layers.1.fc2.weight", "target_letter_decoder.layers.1.fc2.bias", "target_letter_decoder.layers.1.final_layer_norm.weight", "target_letter_decoder.layers.1.final_layer_norm.bias", "target_letter_decoder.layers.2.self_attn.k_proj.weight", "target_letter_decoder.layers.2.self_attn.k_proj.bias", "target_letter_decoder.layers.2.self_attn.v_proj.weight", "target_letter_decoder.layers.2.self_attn.v_proj.bias", "target_letter_decoder.layers.2.self_attn.q_proj.weight", "target_letter_decoder.layers.2.self_attn.q_proj.bias", "target_letter_decoder.layers.2.self_attn.out_proj.weight", "target_letter_decoder.layers.2.self_attn.out_proj.bias", "target_letter_decoder.layers.2.self_attn_layer_norm.weight", "target_letter_decoder.layers.2.self_attn_layer_norm.bias", "target_letter_decoder.layers.2.encoder_attn.k_proj.weight", "target_letter_decoder.layers.2.encoder_attn.k_proj.bias", "target_letter_decoder.layers.2.encoder_attn.v_proj.weight", "target_letter_decoder.layers.2.encoder_attn.v_proj.bias", "target_letter_decoder.layers.2.encoder_attn.q_proj.weight", "target_letter_decoder.layers.2.encoder_attn.q_proj.bias", "target_letter_decoder.layers.2.encoder_attn.out_proj.weight", "target_letter_decoder.layers.2.encoder_attn.out_proj.bias", "target_letter_decoder.layers.2.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.2.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.2.fc1.weight", "target_letter_decoder.layers.2.fc1.bias", "target_letter_decoder.layers.2.fc2.weight", "target_letter_decoder.layers.2.fc2.bias", "target_letter_decoder.layers.2.final_layer_norm.weight", "target_letter_decoder.layers.2.final_layer_norm.bias", "target_letter_decoder.layers.3.self_attn.k_proj.weight", "target_letter_decoder.layers.3.self_attn.k_proj.bias", "target_letter_decoder.layers.3.self_attn.v_proj.weight", "target_letter_decoder.layers.3.self_attn.v_proj.bias", "target_letter_decoder.layers.3.self_attn.q_proj.weight", "target_letter_decoder.layers.3.self_attn.q_proj.bias", "target_letter_decoder.layers.3.self_attn.out_proj.weight", "target_letter_decoder.layers.3.self_attn.out_proj.bias", "target_letter_decoder.layers.3.self_attn_layer_norm.weight", "target_letter_decoder.layers.3.self_attn_layer_norm.bias", "target_letter_decoder.layers.3.encoder_attn.k_proj.weight", "target_letter_decoder.layers.3.encoder_attn.k_proj.bias", "target_letter_decoder.layers.3.encoder_attn.v_proj.weight", "target_letter_decoder.layers.3.encoder_attn.v_proj.bias", "target_letter_decoder.layers.3.encoder_attn.q_proj.weight", "target_letter_decoder.layers.3.encoder_attn.q_proj.bias", "target_letter_decoder.layers.3.encoder_attn.out_proj.weight", "target_letter_decoder.layers.3.encoder_attn.out_proj.bias", "target_letter_decoder.layers.3.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.3.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.3.fc1.weight", "target_letter_decoder.layers.3.fc1.bias", "target_letter_decoder.layers.3.fc2.weight", "target_letter_decoder.layers.3.fc2.bias", "target_letter_decoder.layers.3.final_layer_norm.weight", "target_letter_decoder.layers.3.final_layer_norm.bias", "target_letter_decoder.layers.4.self_attn.k_proj.weight", "target_letter_decoder.layers.4.self_attn.k_proj.bias", "target_letter_decoder.layers.4.self_attn.v_proj.weight", "target_letter_decoder.layers.4.self_attn.v_proj.bias", "target_letter_decoder.layers.4.self_attn.q_proj.weight", "target_letter_decoder.layers.4.self_attn.q_proj.bias", "target_letter_decoder.layers.4.self_attn.out_proj.weight", "target_letter_decoder.layers.4.self_attn.out_proj.bias", "target_letter_decoder.layers.4.self_attn_layer_norm.weight", "target_letter_decoder.layers.4.self_attn_layer_norm.bias", "target_letter_decoder.layers.4.encoder_attn.k_proj.weight", "target_letter_decoder.layers.4.encoder_attn.k_proj.bias", "target_letter_decoder.layers.4.encoder_attn.v_proj.weight", "target_letter_decoder.layers.4.encoder_attn.v_proj.bias", "target_letter_decoder.layers.4.encoder_attn.q_proj.weight", "target_letter_decoder.layers.4.encoder_attn.q_proj.bias", "target_letter_decoder.layers.4.encoder_attn.out_proj.weight", "target_letter_decoder.layers.4.encoder_attn.out_proj.bias", "target_letter_decoder.layers.4.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.4.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.4.fc1.weight", "target_letter_decoder.layers.4.fc1.bias", "target_letter_decoder.layers.4.fc2.weight", "target_letter_decoder.layers.4.fc2.bias", "target_letter_decoder.layers.4.final_layer_norm.weight", "target_letter_decoder.layers.4.final_layer_norm.bias", "target_letter_decoder.layers.5.self_attn.k_proj.weight", "target_letter_decoder.layers.5.self_attn.k_proj.bias", "target_letter_decoder.layers.5.self_attn.v_proj.weight", "target_letter_decoder.layers.5.self_attn.v_proj.bias", "target_letter_decoder.layers.5.self_attn.q_proj.weight", "target_letter_decoder.layers.5.self_attn.q_proj.bias", "target_letter_decoder.layers.5.self_attn.out_proj.weight", "target_letter_decoder.layers.5.self_attn.out_proj.bias", "target_letter_decoder.layers.5.self_attn_layer_norm.weight", "target_letter_decoder.layers.5.self_attn_layer_norm.bias", "target_letter_decoder.layers.5.encoder_attn.k_proj.weight", "target_letter_decoder.layers.5.encoder_attn.k_proj.bias", "target_letter_decoder.layers.5.encoder_attn.v_proj.weight", "target_letter_decoder.layers.5.encoder_attn.v_proj.bias", "target_letter_decoder.layers.5.encoder_attn.q_proj.weight", "target_letter_decoder.layers.5.encoder_attn.q_proj.bias", "target_letter_decoder.layers.5.encoder_attn.out_proj.weight", "target_letter_decoder.layers.5.encoder_attn.out_proj.bias", "target_letter_decoder.layers.5.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.5.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.5.fc1.weight", "target_letter_decoder.layers.5.fc1.bias", "target_letter_decoder.layers.5.fc2.weight", "target_letter_decoder.layers.5.fc2.bias", "target_letter_decoder.layers.5.final_layer_norm.weight", "target_letter_decoder.layers.5.final_layer_norm.bias", "target_letter_decoder.layers.6.self_attn.k_proj.weight", "target_letter_decoder.layers.6.self_attn.k_proj.bias", "target_letter_decoder.layers.6.self_attn.v_proj.weight", "target_letter_decoder.layers.6.self_attn.v_proj.bias", "target_letter_decoder.layers.6.self_attn.q_proj.weight", "target_letter_decoder.layers.6.self_attn.q_proj.bias", "target_letter_decoder.layers.6.self_attn.out_proj.weight", "target_letter_decoder.layers.6.self_attn.out_proj.bias", "target_letter_decoder.layers.6.self_attn_layer_norm.weight", "target_letter_decoder.layers.6.self_attn_layer_norm.bias", "target_letter_decoder.layers.6.encoder_attn.k_proj.weight", "target_letter_decoder.layers.6.encoder_attn.k_proj.bias", "target_letter_decoder.layers.6.encoder_attn.v_proj.weight", "target_letter_decoder.layers.6.encoder_attn.v_proj.bias", "target_letter_decoder.layers.6.encoder_attn.q_proj.weight", "target_letter_decoder.layers.6.encoder_attn.q_proj.bias", "target_letter_decoder.layers.6.encoder_attn.out_proj.weight", "target_letter_decoder.layers.6.encoder_attn.out_proj.bias", "target_letter_decoder.layers.6.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.6.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.6.fc1.weight", "target_letter_decoder.layers.6.fc1.bias", "target_letter_decoder.layers.6.fc2.weight", "target_letter_decoder.layers.6.fc2.bias", "target_letter_decoder.layers.6.final_layer_norm.weight", "target_letter_decoder.layers.6.final_layer_norm.bias", "target_letter_decoder.layers.7.self_attn.k_proj.weight", "target_letter_decoder.layers.7.self_attn.k_proj.bias", "target_letter_decoder.layers.7.self_attn.v_proj.weight", "target_letter_decoder.layers.7.self_attn.v_proj.bias", "target_letter_decoder.layers.7.self_attn.q_proj.weight", "target_letter_decoder.layers.7.self_attn.q_proj.bias", "target_letter_decoder.layers.7.self_attn.out_proj.weight", "target_letter_decoder.layers.7.self_attn.out_proj.bias", "target_letter_decoder.layers.7.self_attn_layer_norm.weight", "target_letter_decoder.layers.7.self_attn_layer_norm.bias", "target_letter_decoder.layers.7.encoder_attn.k_proj.weight", "target_letter_decoder.layers.7.encoder_attn.k_proj.bias", "target_letter_decoder.layers.7.encoder_attn.v_proj.weight", "target_letter_decoder.layers.7.encoder_attn.v_proj.bias", "target_letter_decoder.layers.7.encoder_attn.q_proj.weight", "target_letter_decoder.layers.7.encoder_attn.q_proj.bias", "target_letter_decoder.layers.7.encoder_attn.out_proj.weight", "target_letter_decoder.layers.7.encoder_attn.out_proj.bias", "target_letter_decoder.layers.7.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.7.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.7.fc1.weight", "target_letter_decoder.layers.7.fc1.bias", "target_letter_decoder.layers.7.fc2.weight", "target_letter_decoder.layers.7.fc2.bias", "target_letter_decoder.layers.7.final_layer_norm.weight", "target_letter_decoder.layers.7.final_layer_norm.bias", "target_letter_decoder.layers.8.self_attn.k_proj.weight", "target_letter_decoder.layers.8.self_attn.k_proj.bias", "target_letter_decoder.layers.8.self_attn.v_proj.weight", "target_letter_decoder.layers.8.self_attn.v_proj.bias", "target_letter_decoder.layers.8.self_attn.q_proj.weight", "target_letter_decoder.layers.8.self_attn.q_proj.bias", "target_letter_decoder.layers.8.self_attn.out_proj.weight", "target_letter_decoder.layers.8.self_attn.out_proj.bias", "target_letter_decoder.layers.8.self_attn_layer_norm.weight", "target_letter_decoder.layers.8.self_attn_layer_norm.bias", "target_letter_decoder.layers.8.encoder_attn.k_proj.weight", "target_letter_decoder.layers.8.encoder_attn.k_proj.bias", "target_letter_decoder.layers.8.encoder_attn.v_proj.weight", "target_letter_decoder.layers.8.encoder_attn.v_proj.bias", "target_letter_decoder.layers.8.encoder_attn.q_proj.weight", "target_letter_decoder.layers.8.encoder_attn.q_proj.bias", "target_letter_decoder.layers.8.encoder_attn.out_proj.weight", "target_letter_decoder.layers.8.encoder_attn.out_proj.bias", "target_letter_decoder.layers.8.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.8.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.8.fc1.weight", "target_letter_decoder.layers.8.fc1.bias", "target_letter_decoder.layers.8.fc2.weight", "target_letter_decoder.layers.8.fc2.bias", "target_letter_decoder.layers.8.final_layer_norm.weight", "target_letter_decoder.layers.8.final_layer_norm.bias", "target_letter_decoder.layers.9.self_attn.k_proj.weight", "target_letter_decoder.layers.9.self_attn.k_proj.bias", "target_letter_decoder.layers.9.self_attn.v_proj.weight", "target_letter_decoder.layers.9.self_attn.v_proj.bias", "target_letter_decoder.layers.9.self_attn.q_proj.weight", "target_letter_decoder.layers.9.self_attn.q_proj.bias", "target_letter_decoder.layers.9.self_attn.out_proj.weight", "target_letter_decoder.layers.9.self_attn.out_proj.bias", "target_letter_decoder.layers.9.self_attn_layer_norm.weight", "target_letter_decoder.layers.9.self_attn_layer_norm.bias", "target_letter_decoder.layers.9.encoder_attn.k_proj.weight", "target_letter_decoder.layers.9.encoder_attn.k_proj.bias", "target_letter_decoder.layers.9.encoder_attn.v_proj.weight", "target_letter_decoder.layers.9.encoder_attn.v_proj.bias", "target_letter_decoder.layers.9.encoder_attn.q_proj.weight", "target_letter_decoder.layers.9.encoder_attn.q_proj.bias", "target_letter_decoder.layers.9.encoder_attn.out_proj.weight", "target_letter_decoder.layers.9.encoder_attn.out_proj.bias", "target_letter_decoder.layers.9.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.9.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.9.fc1.weight", "target_letter_decoder.layers.9.fc1.bias", "target_letter_decoder.layers.9.fc2.weight", "target_letter_decoder.layers.9.fc2.bias", "target_letter_decoder.layers.9.final_layer_norm.weight", "target_letter_decoder.layers.9.final_layer_norm.bias", "target_letter_decoder.layers.10.self_attn.k_proj.weight", "target_letter_decoder.layers.10.self_attn.k_proj.bias", "target_letter_decoder.layers.10.self_attn.v_proj.weight", "target_letter_decoder.layers.10.self_attn.v_proj.bias", "target_letter_decoder.layers.10.self_attn.q_proj.weight", "target_letter_decoder.layers.10.self_attn.q_proj.bias", "target_letter_decoder.layers.10.self_attn.out_proj.weight", "target_letter_decoder.layers.10.self_attn.out_proj.bias", "target_letter_decoder.layers.10.self_attn_layer_norm.weight", "target_letter_decoder.layers.10.self_attn_layer_norm.bias", "target_letter_decoder.layers.10.encoder_attn.k_proj.weight", "target_letter_decoder.layers.10.encoder_attn.k_proj.bias", "target_letter_decoder.layers.10.encoder_attn.v_proj.weight", "target_letter_decoder.layers.10.encoder_attn.v_proj.bias", "target_letter_decoder.layers.10.encoder_attn.q_proj.weight", "target_letter_decoder.layers.10.encoder_attn.q_proj.bias", "target_letter_decoder.layers.10.encoder_attn.out_proj.weight", "target_letter_decoder.layers.10.encoder_attn.out_proj.bias", "target_letter_decoder.layers.10.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.10.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.10.fc1.weight", "target_letter_decoder.layers.10.fc1.bias", "target_letter_decoder.layers.10.fc2.weight", "target_letter_decoder.layers.10.fc2.bias", "target_letter_decoder.layers.10.final_layer_norm.weight", "target_letter_decoder.layers.10.final_layer_norm.bias", "target_letter_decoder.layers.11.self_attn.k_proj.weight", "target_letter_decoder.layers.11.self_attn.k_proj.bias", "target_letter_decoder.layers.11.self_attn.v_proj.weight", "target_letter_decoder.layers.11.self_attn.v_proj.bias", "target_letter_decoder.layers.11.self_attn.q_proj.weight", "target_letter_decoder.layers.11.self_attn.q_proj.bias", "target_letter_decoder.layers.11.self_attn.out_proj.weight", "target_letter_decoder.layers.11.self_attn.out_proj.bias", "target_letter_decoder.layers.11.self_attn_layer_norm.weight", "target_letter_decoder.layers.11.self_attn_layer_norm.bias", "target_letter_decoder.layers.11.encoder_attn.k_proj.weight", "target_letter_decoder.layers.11.encoder_attn.k_proj.bias", "target_letter_decoder.layers.11.encoder_attn.v_proj.weight", "target_letter_decoder.layers.11.encoder_attn.v_proj.bias", "target_letter_decoder.layers.11.encoder_attn.q_proj.weight", "target_letter_decoder.layers.11.encoder_attn.q_proj.bias", "target_letter_decoder.layers.11.encoder_attn.out_proj.weight", "target_letter_decoder.layers.11.encoder_attn.out_proj.bias", "target_letter_decoder.layers.11.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.11.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.11.fc1.weight", "target_letter_decoder.layers.11.fc1.bias", "target_letter_decoder.layers.11.fc2.weight", "target_letter_decoder.layers.11.fc2.bias", "target_letter_decoder.layers.11.final_layer_norm.weight", "target_letter_decoder.layers.11.final_layer_norm.bias", "target_letter_decoder.layers.12.self_attn.k_proj.weight", "target_letter_decoder.layers.12.self_attn.k_proj.bias", "target_letter_decoder.layers.12.self_attn.v_proj.weight", "target_letter_decoder.layers.12.self_attn.v_proj.bias", "target_letter_decoder.layers.12.self_attn.q_proj.weight", "target_letter_decoder.layers.12.self_attn.q_proj.bias", "target_letter_decoder.layers.12.self_attn.out_proj.weight", "target_letter_decoder.layers.12.self_attn.out_proj.bias", "target_letter_decoder.layers.12.self_attn_layer_norm.weight", "target_letter_decoder.layers.12.self_attn_layer_norm.bias", "target_letter_decoder.layers.12.encoder_attn.k_proj.weight", "target_letter_decoder.layers.12.encoder_attn.k_proj.bias", "target_letter_decoder.layers.12.encoder_attn.v_proj.weight", "target_letter_decoder.layers.12.encoder_attn.v_proj.bias", "target_letter_decoder.layers.12.encoder_attn.q_proj.weight", "target_letter_decoder.layers.12.encoder_attn.q_proj.bias", "target_letter_decoder.layers.12.encoder_attn.out_proj.weight", "target_letter_decoder.layers.12.encoder_attn.out_proj.bias", "target_letter_decoder.layers.12.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.12.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.12.fc1.weight", "target_letter_decoder.layers.12.fc1.bias", "target_letter_decoder.layers.12.fc2.weight", "target_letter_decoder.layers.12.fc2.bias", "target_letter_decoder.layers.12.final_layer_norm.weight", "target_letter_decoder.layers.12.final_layer_norm.bias", "target_letter_decoder.layers.13.self_attn.k_proj.weight", "target_letter_decoder.layers.13.self_attn.k_proj.bias", "target_letter_decoder.layers.13.self_attn.v_proj.weight", "target_letter_decoder.layers.13.self_attn.v_proj.bias", "target_letter_decoder.layers.13.self_attn.q_proj.weight", "target_letter_decoder.layers.13.self_attn.q_proj.bias", "target_letter_decoder.layers.13.self_attn.out_proj.weight", "target_letter_decoder.layers.13.self_attn.out_proj.bias", "target_letter_decoder.layers.13.self_attn_layer_norm.weight", "target_letter_decoder.layers.13.self_attn_layer_norm.bias", "target_letter_decoder.layers.13.encoder_attn.k_proj.weight", "target_letter_decoder.layers.13.encoder_attn.k_proj.bias", "target_letter_decoder.layers.13.encoder_attn.v_proj.weight", "target_letter_decoder.layers.13.encoder_attn.v_proj.bias", "target_letter_decoder.layers.13.encoder_attn.q_proj.weight", "target_letter_decoder.layers.13.encoder_attn.q_proj.bias", "target_letter_decoder.layers.13.encoder_attn.out_proj.weight", "target_letter_decoder.layers.13.encoder_attn.out_proj.bias", "target_letter_decoder.layers.13.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.13.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.13.fc1.weight", "target_letter_decoder.layers.13.fc1.bias", "target_letter_decoder.layers.13.fc2.weight", "target_letter_decoder.layers.13.fc2.bias", "target_letter_decoder.layers.13.final_layer_norm.weight", "target_letter_decoder.layers.13.final_layer_norm.bias", "target_letter_decoder.layers.14.self_attn.k_proj.weight", "target_letter_decoder.layers.14.self_attn.k_proj.bias", "target_letter_decoder.layers.14.self_attn.v_proj.weight", "target_letter_decoder.layers.14.self_attn.v_proj.bias", "target_letter_decoder.layers.14.self_attn.q_proj.weight", "target_letter_decoder.layers.14.self_attn.q_proj.bias", "target_letter_decoder.layers.14.self_attn.out_proj.weight", "target_letter_decoder.layers.14.self_attn.out_proj.bias", "target_letter_decoder.layers.14.self_attn_layer_norm.weight", "target_letter_decoder.layers.14.self_attn_layer_norm.bias", "target_letter_decoder.layers.14.encoder_attn.k_proj.weight", "target_letter_decoder.layers.14.encoder_attn.k_proj.bias", "target_letter_decoder.layers.14.encoder_attn.v_proj.weight", "target_letter_decoder.layers.14.encoder_attn.v_proj.bias", "target_letter_decoder.layers.14.encoder_attn.q_proj.weight", "target_letter_decoder.layers.14.encoder_attn.q_proj.bias", "target_letter_decoder.layers.14.encoder_attn.out_proj.weight", "target_letter_decoder.layers.14.encoder_attn.out_proj.bias", "target_letter_decoder.layers.14.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.14.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.14.fc1.weight", "target_letter_decoder.layers.14.fc1.bias", "target_letter_decoder.layers.14.fc2.weight", "target_letter_decoder.layers.14.fc2.bias", "target_letter_decoder.layers.14.final_layer_norm.weight", "target_letter_decoder.layers.14.final_layer_norm.bias", "target_letter_decoder.layers.15.self_attn.k_proj.weight", "target_letter_decoder.layers.15.self_attn.k_proj.bias", "target_letter_decoder.layers.15.self_attn.v_proj.weight", "target_letter_decoder.layers.15.self_attn.v_proj.bias", "target_letter_decoder.layers.15.self_attn.q_proj.weight", "target_letter_decoder.layers.15.self_attn.q_proj.bias", "target_letter_decoder.layers.15.self_attn.out_proj.weight", "target_letter_decoder.layers.15.self_attn.out_proj.bias", "target_letter_decoder.layers.15.self_attn_layer_norm.weight", "target_letter_decoder.layers.15.self_attn_layer_norm.bias", "target_letter_decoder.layers.15.encoder_attn.k_proj.weight", "target_letter_decoder.layers.15.encoder_attn.k_proj.bias", "target_letter_decoder.layers.15.encoder_attn.v_proj.weight", "target_letter_decoder.layers.15.encoder_attn.v_proj.bias", "target_letter_decoder.layers.15.encoder_attn.q_proj.weight", "target_letter_decoder.layers.15.encoder_attn.q_proj.bias", "target_letter_decoder.layers.15.encoder_attn.out_proj.weight", "target_letter_decoder.layers.15.encoder_attn.out_proj.bias", "target_letter_decoder.layers.15.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.15.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.15.fc1.weight", "target_letter_decoder.layers.15.fc1.bias", "target_letter_decoder.layers.15.fc2.weight", "target_letter_decoder.layers.15.fc2.bias", "target_letter_decoder.layers.15.final_layer_norm.weight", "target_letter_decoder.layers.15.final_layer_norm.bias", "target_letter_decoder.layers.16.self_attn.k_proj.weight", "target_letter_decoder.layers.16.self_attn.k_proj.bias", "target_letter_decoder.layers.16.self_attn.v_proj.weight", "target_letter_decoder.layers.16.self_attn.v_proj.bias", "target_letter_decoder.layers.16.self_attn.q_proj.weight", "target_letter_decoder.layers.16.self_attn.q_proj.bias", "target_letter_decoder.layers.16.self_attn.out_proj.weight", "target_letter_decoder.layers.16.self_attn.out_proj.bias", "target_letter_decoder.layers.16.self_attn_layer_norm.weight", "target_letter_decoder.layers.16.self_attn_layer_norm.bias", "target_letter_decoder.layers.16.encoder_attn.k_proj.weight", "target_letter_decoder.layers.16.encoder_attn.k_proj.bias", "target_letter_decoder.layers.16.encoder_attn.v_proj.weight", "target_letter_decoder.layers.16.encoder_attn.v_proj.bias", "target_letter_decoder.layers.16.encoder_attn.q_proj.weight", "target_letter_decoder.layers.16.encoder_attn.q_proj.bias", "target_letter_decoder.layers.16.encoder_attn.out_proj.weight", "target_letter_decoder.layers.16.encoder_attn.out_proj.bias", "target_letter_decoder.layers.16.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.16.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.16.fc1.weight", "target_letter_decoder.layers.16.fc1.bias", "target_letter_decoder.layers.16.fc2.weight", "target_letter_decoder.layers.16.fc2.bias", "target_letter_decoder.layers.16.final_layer_norm.weight", "target_letter_decoder.layers.16.final_layer_norm.bias", "target_letter_decoder.layers.17.self_attn.k_proj.weight", "target_letter_decoder.layers.17.self_attn.k_proj.bias", "target_letter_decoder.layers.17.self_attn.v_proj.weight", "target_letter_decoder.layers.17.self_attn.v_proj.bias", "target_letter_decoder.layers.17.self_attn.q_proj.weight", "target_letter_decoder.layers.17.self_attn.q_proj.bias", "target_letter_decoder.layers.17.self_attn.out_proj.weight", "target_letter_decoder.layers.17.self_attn.out_proj.bias", "target_letter_decoder.layers.17.self_attn_layer_norm.weight", "target_letter_decoder.layers.17.self_attn_layer_norm.bias", "target_letter_decoder.layers.17.encoder_attn.k_proj.weight", "target_letter_decoder.layers.17.encoder_attn.k_proj.bias", "target_letter_decoder.layers.17.encoder_attn.v_proj.weight", "target_letter_decoder.layers.17.encoder_attn.v_proj.bias", "target_letter_decoder.layers.17.encoder_attn.q_proj.weight", "target_letter_decoder.layers.17.encoder_attn.q_proj.bias", "target_letter_decoder.layers.17.encoder_attn.out_proj.weight", "target_letter_decoder.layers.17.encoder_attn.out_proj.bias", "target_letter_decoder.layers.17.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.17.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.17.fc1.weight", "target_letter_decoder.layers.17.fc1.bias", "target_letter_decoder.layers.17.fc2.weight", "target_letter_decoder.layers.17.fc2.bias", "target_letter_decoder.layers.17.final_layer_norm.weight", "target_letter_decoder.layers.17.final_layer_norm.bias", "target_letter_decoder.layers.18.self_attn.k_proj.weight", "target_letter_decoder.layers.18.self_attn.k_proj.bias", "target_letter_decoder.layers.18.self_attn.v_proj.weight", "target_letter_decoder.layers.18.self_attn.v_proj.bias", "target_letter_decoder.layers.18.self_attn.q_proj.weight", "target_letter_decoder.layers.18.self_attn.q_proj.bias", "target_letter_decoder.layers.18.self_attn.out_proj.weight", "target_letter_decoder.layers.18.self_attn.out_proj.bias", "target_letter_decoder.layers.18.self_attn_layer_norm.weight", "target_letter_decoder.layers.18.self_attn_layer_norm.bias", "target_letter_decoder.layers.18.encoder_attn.k_proj.weight", "target_letter_decoder.layers.18.encoder_attn.k_proj.bias", "target_letter_decoder.layers.18.encoder_attn.v_proj.weight", "target_letter_decoder.layers.18.encoder_attn.v_proj.bias", "target_letter_decoder.layers.18.encoder_attn.q_proj.weight", "target_letter_decoder.layers.18.encoder_attn.q_proj.bias", "target_letter_decoder.layers.18.encoder_attn.out_proj.weight", "target_letter_decoder.layers.18.encoder_attn.out_proj.bias", "target_letter_decoder.layers.18.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.18.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.18.fc1.weight", "target_letter_decoder.layers.18.fc1.bias", "target_letter_decoder.layers.18.fc2.weight", "target_letter_decoder.layers.18.fc2.bias", "target_letter_decoder.layers.18.final_layer_norm.weight", "target_letter_decoder.layers.18.final_layer_norm.bias", "target_letter_decoder.layers.19.self_attn.k_proj.weight", "target_letter_decoder.layers.19.self_attn.k_proj.bias", "target_letter_decoder.layers.19.self_attn.v_proj.weight", "target_letter_decoder.layers.19.self_attn.v_proj.bias", "target_letter_decoder.layers.19.self_attn.q_proj.weight", "target_letter_decoder.layers.19.self_attn.q_proj.bias", "target_letter_decoder.layers.19.self_attn.out_proj.weight", "target_letter_decoder.layers.19.self_attn.out_proj.bias", "target_letter_decoder.layers.19.self_attn_layer_norm.weight", "target_letter_decoder.layers.19.self_attn_layer_norm.bias", "target_letter_decoder.layers.19.encoder_attn.k_proj.weight", "target_letter_decoder.layers.19.encoder_attn.k_proj.bias", "target_letter_decoder.layers.19.encoder_attn.v_proj.weight", "target_letter_decoder.layers.19.encoder_attn.v_proj.bias", "target_letter_decoder.layers.19.encoder_attn.q_proj.weight", "target_letter_decoder.layers.19.encoder_attn.q_proj.bias", "target_letter_decoder.layers.19.encoder_attn.out_proj.weight", "target_letter_decoder.layers.19.encoder_attn.out_proj.bias", "target_letter_decoder.layers.19.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.19.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.19.fc1.weight", "target_letter_decoder.layers.19.fc1.bias", "target_letter_decoder.layers.19.fc2.weight", "target_letter_decoder.layers.19.fc2.bias", "target_letter_decoder.layers.19.final_layer_norm.weight", "target_letter_decoder.layers.19.final_layer_norm.bias", "target_letter_decoder.layers.20.self_attn.k_proj.weight", "target_letter_decoder.layers.20.self_attn.k_proj.bias", "target_letter_decoder.layers.20.self_attn.v_proj.weight", "target_letter_decoder.layers.20.self_attn.v_proj.bias", "target_letter_decoder.layers.20.self_attn.q_proj.weight", "target_letter_decoder.layers.20.self_attn.q_proj.bias", "target_letter_decoder.layers.20.self_attn.out_proj.weight", "target_letter_decoder.layers.20.self_attn.out_proj.bias", "target_letter_decoder.layers.20.self_attn_layer_norm.weight", "target_letter_decoder.layers.20.self_attn_layer_norm.bias", "target_letter_decoder.layers.20.encoder_attn.k_proj.weight", "target_letter_decoder.layers.20.encoder_attn.k_proj.bias", "target_letter_decoder.layers.20.encoder_attn.v_proj.weight", "target_letter_decoder.layers.20.encoder_attn.v_proj.bias", "target_letter_decoder.layers.20.encoder_attn.q_proj.weight", "target_letter_decoder.layers.20.encoder_attn.q_proj.bias", "target_letter_decoder.layers.20.encoder_attn.out_proj.weight", "target_letter_decoder.layers.20.encoder_attn.out_proj.bias", "target_letter_decoder.layers.20.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.20.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.20.fc1.weight", "target_letter_decoder.layers.20.fc1.bias", "target_letter_decoder.layers.20.fc2.weight", "target_letter_decoder.layers.20.fc2.bias", "target_letter_decoder.layers.20.final_layer_norm.weight", "target_letter_decoder.layers.20.final_layer_norm.bias", "target_letter_decoder.layers.21.self_attn.k_proj.weight", "target_letter_decoder.layers.21.self_attn.k_proj.bias", "target_letter_decoder.layers.21.self_attn.v_proj.weight", "target_letter_decoder.layers.21.self_attn.v_proj.bias", "target_letter_decoder.layers.21.self_attn.q_proj.weight", "target_letter_decoder.layers.21.self_attn.q_proj.bias", "target_letter_decoder.layers.21.self_attn.out_proj.weight", "target_letter_decoder.layers.21.self_attn.out_proj.bias", "target_letter_decoder.layers.21.self_attn_layer_norm.weight", "target_letter_decoder.layers.21.self_attn_layer_norm.bias", "target_letter_decoder.layers.21.encoder_attn.k_proj.weight", "target_letter_decoder.layers.21.encoder_attn.k_proj.bias", "target_letter_decoder.layers.21.encoder_attn.v_proj.weight", "target_letter_decoder.layers.21.encoder_attn.v_proj.bias", "target_letter_decoder.layers.21.encoder_attn.q_proj.weight", "target_letter_decoder.layers.21.encoder_attn.q_proj.bias", "target_letter_decoder.layers.21.encoder_attn.out_proj.weight", "target_letter_decoder.layers.21.encoder_attn.out_proj.bias", "target_letter_decoder.layers.21.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.21.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.21.fc1.weight", "target_letter_decoder.layers.21.fc1.bias", "target_letter_decoder.layers.21.fc2.weight", "target_letter_decoder.layers.21.fc2.bias", "target_letter_decoder.layers.21.final_layer_norm.weight", "target_letter_decoder.layers.21.final_layer_norm.bias", "target_letter_decoder.layers.22.self_attn.k_proj.weight", "target_letter_decoder.layers.22.self_attn.k_proj.bias", "target_letter_decoder.layers.22.self_attn.v_proj.weight", "target_letter_decoder.layers.22.self_attn.v_proj.bias", "target_letter_decoder.layers.22.self_attn.q_proj.weight", "target_letter_decoder.layers.22.self_attn.q_proj.bias", "target_letter_decoder.layers.22.self_attn.out_proj.weight", "target_letter_decoder.layers.22.self_attn.out_proj.bias", "target_letter_decoder.layers.22.self_attn_layer_norm.weight", "target_letter_decoder.layers.22.self_attn_layer_norm.bias", "target_letter_decoder.layers.22.encoder_attn.k_proj.weight", "target_letter_decoder.layers.22.encoder_attn.k_proj.bias", "target_letter_decoder.layers.22.encoder_attn.v_proj.weight", "target_letter_decoder.layers.22.encoder_attn.v_proj.bias", "target_letter_decoder.layers.22.encoder_attn.q_proj.weight", "target_letter_decoder.layers.22.encoder_attn.q_proj.bias", "target_letter_decoder.layers.22.encoder_attn.out_proj.weight", "target_letter_decoder.layers.22.encoder_attn.out_proj.bias", "target_letter_decoder.layers.22.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.22.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.22.fc1.weight", "target_letter_decoder.layers.22.fc1.bias", "target_letter_decoder.layers.22.fc2.weight", "target_letter_decoder.layers.22.fc2.bias", "target_letter_decoder.layers.22.final_layer_norm.weight", "target_letter_decoder.layers.22.final_layer_norm.bias", "target_letter_decoder.layers.23.self_attn.k_proj.weight", "target_letter_decoder.layers.23.self_attn.k_proj.bias", "target_letter_decoder.layers.23.self_attn.v_proj.weight", "target_letter_decoder.layers.23.self_attn.v_proj.bias", "target_letter_decoder.layers.23.self_attn.q_proj.weight", "target_letter_decoder.layers.23.self_attn.q_proj.bias", "target_letter_decoder.layers.23.self_attn.out_proj.weight", "target_letter_decoder.layers.23.self_attn.out_proj.bias", "target_letter_decoder.layers.23.self_attn_layer_norm.weight", "target_letter_decoder.layers.23.self_attn_layer_norm.bias", "target_letter_decoder.layers.23.encoder_attn.k_proj.weight", "target_letter_decoder.layers.23.encoder_attn.k_proj.bias", "target_letter_decoder.layers.23.encoder_attn.v_proj.weight", "target_letter_decoder.layers.23.encoder_attn.v_proj.bias", "target_letter_decoder.layers.23.encoder_attn.q_proj.weight", "target_letter_decoder.layers.23.encoder_attn.q_proj.bias", "target_letter_decoder.layers.23.encoder_attn.out_proj.weight", "target_letter_decoder.layers.23.encoder_attn.out_proj.bias", "target_letter_decoder.layers.23.encoder_attn_layer_norm.weight", "target_letter_decoder.layers.23.encoder_attn_layer_norm.bias", "target_letter_decoder.layers.23.fc1.weight", "target_letter_decoder.layers.23.fc1.bias", "target_letter_decoder.layers.23.fc2.weight", "target_letter_decoder.layers.23.fc2.bias", "target_letter_decoder.layers.23.final_layer_norm.weight", "target_letter_decoder.layers.23.final_layer_norm.bias", "target_letter_decoder.layer_norm.weight", "target_letter_decoder.layer_norm.bias", "target_letter_decoder.output_projection.weight", "synthesizer_encoder.layers.0.self_attn.k_proj.weight", "synthesizer_encoder.layers.0.self_attn.k_proj.bias", "synthesizer_encoder.layers.0.self_attn.v_proj.weight", "synthesizer_encoder.layers.0.self_attn.v_proj.bias", "synthesizer_encoder.layers.0.self_attn.q_proj.weight", "synthesizer_encoder.layers.0.self_attn.q_proj.bias", "synthesizer_encoder.layers.0.self_attn.out_proj.weight", "synthesizer_encoder.layers.0.self_attn.out_proj.bias", "synthesizer_encoder.layers.0.self_attn_layer_norm.weight", "synthesizer_encoder.layers.0.self_attn_layer_norm.bias", "synthesizer_encoder.layers.0.fc1.weight", "synthesizer_encoder.layers.0.fc1.bias", "synthesizer_encoder.layers.0.fc2.weight", "synthesizer_encoder.layers.0.fc2.bias", "synthesizer_encoder.layers.0.final_layer_norm.weight", "synthesizer_encoder.layers.0.final_layer_norm.bias", "synthesizer_encoder.layers.1.self_attn.k_proj.weight", "synthesizer_encoder.layers.1.self_attn.k_proj.bias", "synthesizer_encoder.layers.1.self_attn.v_proj.weight", "synthesizer_encoder.layers.1.self_attn.v_proj.bias", "synthesizer_encoder.layers.1.self_attn.q_proj.weight", "synthesizer_encoder.layers.1.self_attn.q_proj.bias", "synthesizer_encoder.layers.1.self_attn.out_proj.weight", "synthesizer_encoder.layers.1.self_attn.out_proj.bias", "synthesizer_encoder.layers.1.self_attn_layer_norm.weight", "synthesizer_encoder.layers.1.self_attn_layer_norm.bias", "synthesizer_encoder.layers.1.fc1.weight", "synthesizer_encoder.layers.1.fc1.bias", "synthesizer_encoder.layers.1.fc2.weight", "synthesizer_encoder.layers.1.fc2.bias", "synthesizer_encoder.layers.1.final_layer_norm.weight", "synthesizer_encoder.layers.1.final_layer_norm.bias", "synthesizer_encoder.layers.2.self_attn.k_proj.weight", "synthesizer_encoder.layers.2.self_attn.k_proj.bias", "synthesizer_encoder.layers.2.self_attn.v_proj.weight", "synthesizer_encoder.layers.2.self_attn.v_proj.bias", "synthesizer_encoder.layers.2.self_attn.q_proj.weight", "synthesizer_encoder.layers.2.self_attn.q_proj.bias", "synthesizer_encoder.layers.2.self_attn.out_proj.weight", "synthesizer_encoder.layers.2.self_attn.out_proj.bias", "synthesizer_encoder.layers.2.self_attn_layer_norm.weight", "synthesizer_encoder.layers.2.self_attn_layer_norm.bias", "synthesizer_encoder.layers.2.fc1.weight", "synthesizer_encoder.layers.2.fc1.bias", "synthesizer_encoder.layers.2.fc2.weight", "synthesizer_encoder.layers.2.fc2.bias", "synthesizer_encoder.layers.2.final_layer_norm.weight", "synthesizer_encoder.layers.2.final_layer_norm.bias", "synthesizer_encoder.layers.3.self_attn.k_proj.weight", "synthesizer_encoder.layers.3.self_attn.k_proj.bias", "synthesizer_encoder.layers.3.self_attn.v_proj.weight", "synthesizer_encoder.layers.3.self_attn.v_proj.bias", "synthesizer_encoder.layers.3.self_attn.q_proj.weight", "synthesizer_encoder.layers.3.self_attn.q_proj.bias", "synthesizer_encoder.layers.3.self_attn.out_proj.weight", "synthesizer_encoder.layers.3.self_attn.out_proj.bias", "synthesizer_encoder.layers.3.self_attn_layer_norm.weight", "synthesizer_encoder.layers.3.self_attn_layer_norm.bias", "synthesizer_encoder.layers.3.fc1.weight", "synthesizer_encoder.layers.3.fc1.bias", "synthesizer_encoder.layers.3.fc2.weight", "synthesizer_encoder.layers.3.fc2.bias", "synthesizer_encoder.layers.3.final_layer_norm.weight", "synthesizer_encoder.layers.3.final_layer_norm.bias", "synthesizer_encoder.layers.4.self_attn.k_proj.weight", "synthesizer_encoder.layers.4.self_attn.k_proj.bias", "synthesizer_encoder.layers.4.self_attn.v_proj.weight", "synthesizer_encoder.layers.4.self_attn.v_proj.bias", "synthesizer_encoder.layers.4.self_attn.q_proj.weight", "synthesizer_encoder.layers.4.self_attn.q_proj.bias", "synthesizer_encoder.layers.4.self_attn.out_proj.weight", "synthesizer_encoder.layers.4.self_attn.out_proj.bias", "synthesizer_encoder.layers.4.self_attn_layer_norm.weight", "synthesizer_encoder.layers.4.self_attn_layer_norm.bias", "synthesizer_encoder.layers.4.fc1.weight", "synthesizer_encoder.layers.4.fc1.bias", "synthesizer_encoder.layers.4.fc2.weight", "synthesizer_encoder.layers.4.fc2.bias", "synthesizer_encoder.layers.4.final_layer_norm.weight", "synthesizer_encoder.layers.4.final_layer_norm.bias", "synthesizer_encoder.layers.5.self_attn.k_proj.weight", "synthesizer_encoder.layers.5.self_attn.k_proj.bias", "synthesizer_encoder.layers.5.self_attn.v_proj.weight", "synthesizer_encoder.layers.5.self_attn.v_proj.bias", "synthesizer_encoder.layers.5.self_attn.q_proj.weight", "synthesizer_encoder.layers.5.self_attn.q_proj.bias", "synthesizer_encoder.layers.5.self_attn.out_proj.weight", "synthesizer_encoder.layers.5.self_attn.out_proj.bias", "synthesizer_encoder.layers.5.self_attn_layer_norm.weight", "synthesizer_encoder.layers.5.self_attn_layer_norm.bias", "synthesizer_encoder.layers.5.fc1.weight", "synthesizer_encoder.layers.5.fc1.bias", "synthesizer_encoder.layers.5.fc2.weight", "synthesizer_encoder.layers.5.fc2.bias", "synthesizer_encoder.layers.5.final_layer_norm.weight", "synthesizer_encoder.layers.5.final_layer_norm.bias", "synthesizer_encoder.layer_norm.weight", "synthesizer_encoder.layer_norm.bias". 
        size mismatch for text_decoder_frontend.embed.weight: copying a param with shape torch.Size([10082, 1024]) from checkpoint, the shape in current model is torch.Size([256102, 1024]).
        size mismatch for final_proj.weight: copying a param with shape torch.Size([10082, 1024]) from checkpoint, the shape in current model is torch.Size([256102, 1024]).

coms1580 avatar Feb 06 '24 06:02 coms1580