rust-bert icon indicating copy to clipboard operation
rust-bert copied to clipboard

Failed to run custom model with convert_model.py

Open solaoi opened this issue 1 year ago • 4 comments

Hi, I can't get it to work and I was wondering if you could advise me if the cause is the custom model itself, the conversion method or how to use it?

The conversion is fine, but an error Tch tensor error: cannot find the tensor named model.encoder.embedded_positions.weight occurs at runtime.

I am trying to convert this model. https://huggingface.co/staka/fugumt-en-ja

I have added the model to your code as follows. https://github.com/solaoi/rust-bert/commit/e7d4e1d7351f4363736771b6d3f43521ab25e0ba

usage:

    pub fn translate_to_japanese(text: &str) -> anyhow::Result<String> {
        let model_resource = RemoteResource::from_pretrained(MarianModelResources::ENGLISH2JAPANESE);
        let config_resource =
            RemoteResource::from_pretrained(MarianConfigResources::ENGLISH2JAPANESE);
        let vocab_resource = RemoteResource::from_pretrained(MarianVocabResources::ENGLISH2JAPANESE);
        let merges_resource = RemoteResource::from_pretrained(MarianSpmResources::ENGLISH2JAPANESE);

        let source_languages = MarianSourceLanguages::ENGLISH2JAPANESE;
        let target_languages = MarianTargetLanguages::ENGLISH2JAPANESE;

        let translation_config = TranslationConfig::new(
            ModelType::Marian,
            model_resource,
            config_resource,
            vocab_resource,
            Some(merges_resource),
            source_languages,
            target_languages,
            Device::Cpu
        );
        let model = TranslationModel::new(translation_config)?;
        let output = model.translate(&[text], None, None)?;
        Ok(output.join(""))
    }

solaoi avatar Apr 14 '23 15:04 solaoi

I've found that some models do not have model.encoder.embedded_positions.weight in the conversion logs.

The default model (i.e. OPUS-MT-EN-ROMANCE) has the following conversion logs.

converted model.encoder.embed_positions.weight - 128 bytes
converted model.decoder.embed_positions.weight - 128 bytes

but this model doesn't have these logs either. https://huggingface.co/Helsinki-NLP/opus-tatoeba-en-ja

solaoi avatar Apr 27 '23 11:04 solaoi

I added two parameters ( model.encoder.embed_positions.weight, model.decoder.embed_positions.weight) as below. This solved the problem.

But the translation quality is too stupid, so how to convert is something wrong.

// add this
def sinusoidal_positional_embedding(max_seq_len, d_model):
    position = torch.arange(max_seq_len, dtype=torch.float32).unsqueeze(1)
    div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
    pos_emb = torch.empty(max_seq_len, 1, d_model)
    pos_emb[:, 0, 0::2] = torch.sin(position * div_term)
    pos_emb[:, 0, 1::2] = torch.cos(position * div_term)
    return pos_emb

if __name__ == "__main__":
...
    nps = {}
    target_folder = Path(args.source_file[0]).parent

    // add this
    max_seq_len = 512 # config.json(max_position_embeddings)
    d_model = 512 # config.json(d_model)
    position_embeddings = sinusoidal_positional_embedding(max_seq_len, d_model)
    nps["model.encoder.embed_positions.weight"] = torch.nn.Parameter(position_embeddings.squeeze(1))
    nps["model.decoder.embed_positions.weight"] = torch.nn.Parameter(position_embeddings.squeeze(1))
    
    for source_file in args.source_file:
...
    // add this
    nps = {k: v.detach().numpy() if torch.is_tensor(v) else v for k, v in nps.items()}
    np.savez(target_folder / "model.npz", **nps)
...

solaoi avatar Apr 30 '23 23:04 solaoi

I think maybe the num_beams in config.json is not properly handled. MarianConfig and BertConfig are equivalent as follows. https://github.com/guillaume-be/rust-bert/blob/c37eb32857edb4de0b76066c39b5de52ac7db7dd/src/marian/marian_model.rs#L524

BertConfig has no num_beams. https://github.com/guillaume-be/rust-bert/blob/c37eb32857edb4de0b76066c39b5de52ac7db7dd/src/bert/bert_model.rs#L141-L158

solaoi avatar May 01 '23 01:05 solaoi

hey @solaoi I encountered a similar issue before Tch tensor error: cannot find the tensor named distilbert.transformer.layer.5.sa_layer_norm.weight It can be due to the naming conventions or internal structure of the model. To resolve it, I had to run the conversion script like python utils/convert_model.py --prefix distilbert. /path/to/msmarco-distilbert-base-v3/pytorch_model.bin

VirajKanse avatar May 16 '23 05:05 VirajKanse