ludwig
ludwig copied to clipboard
Using a specific transformer encoder model for Text Classification Task
Hi,
I would like to use a specific transformer encoder model such as roberta-large
instead of the default roberta
, which loads a roberta-base
. Is there any way to do so?
Hi,
You can use auto_transformer
, i.e.:
import pandas as pd
import yaml
from ludwig.api import LudwigModel
config = """
input_features:
- name: text
type: text
encoder: auto_transformer
pretrained_model_name_or_path: 'roberta-large'
output_features:
- name: category
type: category
trainer:
epochs: 1
"""
model = LudwigModel(yaml.load(config), backend="local")
df = pd.DataFrame(
{
"text": ["Suomessa vaihtuu kesän aikana sekä pääministeri että valtiovarain"],
"category": ["Suomi"],
}
)
model.train(df)
model.predict(df)
Related discussion: https://github.com/ludwig-ai/ludwig/discussions/2057
Documentation: https://ludwig-ai.github.io/ludwig-docs/0.5/configuration/features/text_features/#autotransformer
Thanks @justinxzhao for the example!
I would like to get a bit more understanding on the working of Ludwig.
Is the specified encoder (for example, roberta-large) the model used for training after encoding the text sequence or the encoder used to obtain the sequence embeddings, which is later fed to another model (unsure of what black-box model it is)?
Asking this question as, if we have multiple text sequences in our data (for example, question (type: text, encoder: auto_transformer, pretrained_model_name_or_path: 'roberta-large')
and passage (type: text, encoder: auto_transformer, pretrained_model_name_or_path: 'bert-base')
), with each having an unique encoder, how do I specify the encoder model to which both the sequence embeddings are fed?
Is the specified encoder (for example, roberta-large) the model used for training after encoding the text sequence or the encoder used to obtain the sequence embeddings, which is later fed to another model (unsure of what black-box model it is)?
Here's a sequence of what's happening:
- Text sequences are pre-processed and tokenized with the same tokenizer that
roberta-large
uses - The encoded sequences are fed to to the
roberta-large
model. - The output of all Ludwig encoders for all features are fed to a Combiner (by default, a simple concatenation)
- The output of the combiner is fed to the binary decoder, which by default is a simple
nn.Linear
layer with output size=2. - The output of the decoder is post-processed into real predictions for metrics.
Asking this question as, if we have multiple text sequences in our data (for example, question
(type: text, encoder: auto_transformer, pretrained_model_name_or_path: 'roberta-large')
and passage(type: text, encoder: auto_transformer, pretrained_model_name_or_path: 'bert-base')
), with each having an unique encoder, how do I specify the encoder model to which both the sequence embeddings are fed?
The Combiner takes the outputs of all input features encoders and combines them before providing the combined representation to the output feature decoder. You can specify which one to use in the combiner section of the configuration, and if you don't specify a combiner, the concat combiner will be used.
If you have multiple text features and you are using the concat combiner, then their encoder outputs (one from roberta-large
for the question
feature and one from bert-base
for the passage
feature) are concatenated and this combined representation is fed to the decoder.
Having separate transformers for each text feature seems really heavyweight. One alternative technique to consider that's commonly used for multilingual NMT or T5, is to concatenate your features offline, and use a single model for the combined feature, e.g.
question: "...x..."
passage: "...y..."
Becomes:
combined_text: "question ...x... SEP passage ...y..."
The sequence length becomes longer, but one transformer for the combined text should train faster than one transformer for each text input feature.
Thanks, @justinxzhao for the detailed explanation. The example with two transformer models was mentioned to understand if the encoder in Ludwig performs the task of both - encoding the text sequence, and training the model specified in the encoder (basically, whether it's just an encoder or encoder + encoder layers).
Hi @justinxzhao ! I recently came across a similar machine learning framework - Lightwood .
On comparison with Ludwig, they both have functionalities to handle features of multiple input types, pre-process them based on their input type, encode the inputs, TRAIN THE MODEL USING THE ENCODED INPUT (emphasizing on the model), decode the outputs from the trained model into output features.
In Lightwood, the possible models that could be trained are provided here - https://lightwood.io/mixer.html
In similar fashion, are there a list of possible models that could be trained using Ludwig? And if yes, how do we specify our preferred model to train (using the concated output from the combiner)?
Thanks!