ludwig Using a specific transformer encoder model for Text Classification Task

Hi, I would like to use a specific transformer encoder model such as roberta-large instead of the default roberta, which loads a roberta-base. Is there any way to do so?

May 30 '22 10:05 msakthiganesh

Hi,

You can use auto_transformer, i.e.:

import pandas as pd
import yaml

from ludwig.api import LudwigModel

config = """
input_features:
    - name: text
      type: text
      encoder: auto_transformer
      pretrained_model_name_or_path: 'roberta-large'
output_features:
    - name: category
      type: category
trainer:
    epochs: 1
"""
model = LudwigModel(yaml.load(config), backend="local")

df = pd.DataFrame(
    {
        "text": ["Suomessa vaihtuu kesän aikana sekä pääministeri että valtiovarain"],
        "category": ["Suomi"],
    }
)
model.train(df)
model.predict(df)

Related discussion: https://github.com/ludwig-ai/ludwig/discussions/2057

Documentation: https://ludwig-ai.github.io/ludwig-docs/0.5/configuration/features/text_features/#autotransformer

May 31 '22 17:05 justinxzhao

Thanks @justinxzhao for the example!

I would like to get a bit more understanding on the working of Ludwig.

Is the specified encoder (for example, roberta-large) the model used for training after encoding the text sequence or the encoder used to obtain the sequence embeddings, which is later fed to another model (unsure of what black-box model it is)?

Asking this question as, if we have multiple text sequences in our data (for example, question (type: text, encoder: auto_transformer, pretrained_model_name_or_path: 'roberta-large') and passage (type: text, encoder: auto_transformer, pretrained_model_name_or_path: 'bert-base')), with each having an unique encoder, how do I specify the encoder model to which both the sequence embeddings are fed?

Jun 01 '22 05:06 msakthiganesh

Is the specified encoder (for example, roberta-large) the model used for training after encoding the text sequence or the encoder used to obtain the sequence embeddings, which is later fed to another model (unsure of what black-box model it is)?

Here's a sequence of what's happening:

Text sequences are pre-processed and tokenized with the same tokenizer that roberta-large uses
The encoded sequences are fed to to the roberta-large model.
The output of all Ludwig encoders for all features are fed to a Combiner (by default, a simple concatenation)
The output of the combiner is fed to the binary decoder, which by default is a simple nn.Linear layer with output size=2.
The output of the decoder is post-processed into real predictions for metrics.

Asking this question as, if we have multiple text sequences in our data (for example, question (type: text, encoder: auto_transformer, pretrained_model_name_or_path: 'roberta-large') and passage (type: text, encoder: auto_transformer, pretrained_model_name_or_path: 'bert-base')), with each having an unique encoder, how do I specify the encoder model to which both the sequence embeddings are fed?

The Combiner takes the outputs of all input features encoders and combines them before providing the combined representation to the output feature decoder. You can specify which one to use in the combiner section of the configuration, and if you don't specify a combiner, the concat combiner will be used.

If you have multiple text features and you are using the concat combiner, then their encoder outputs (one from roberta-large for the question feature and one from bert-base for the passage feature) are concatenated and this combined representation is fed to the decoder.

Having separate transformers for each text feature seems really heavyweight. One alternative technique to consider that's commonly used for multilingual NMT or T5, is to concatenate your features offline, and use a single model for the combined feature, e.g.

question: "...x..."
passage: "...y..."

Becomes:

combined_text: "question ...x... SEP passage ...y..."

The sequence length becomes longer, but one transformer for the combined text should train faster than one transformer for each text input feature.

Jun 01 '22 17:06 justinxzhao

Thanks, @justinxzhao for the detailed explanation. The example with two transformer models was mentioned to understand if the encoder in Ludwig performs the task of both - encoding the text sequence, and training the model specified in the encoder (basically, whether it's just an encoder or encoder + encoder layers).

Jun 05 '22 05:06 msakthiganesh

Hi @justinxzhao ! I recently came across a similar machine learning framework - Lightwood .

On comparison with Ludwig, they both have functionalities to handle features of multiple input types, pre-process them based on their input type, encode the inputs, TRAIN THE MODEL USING THE ENCODED INPUT (emphasizing on the model), decode the outputs from the trained model into output features.

In Lightwood, the possible models that could be trained are provided here - https://lightwood.io/mixer.html

In similar fashion, are there a list of possible models that could be trained using Ludwig? And if yes, how do we specify our preferred model to train (using the concated output from the combiner)?

Thanks!

Jun 08 '22 06:06 msakthiganesh

ludwig ludwig copied to clipboard

Using a specific transformer encoder model for Text Classification Task

ludwig
ludwig copied to clipboard