keras-io
keras-io copied to clipboard
MAX_INPUT_LENGTH is automatically got set to 512 after fine-tuning even though I initially set to 1024
Following this tutorial, Abstractive Summarization with Hugging Face Transformers I created a text summarization ml model by fine-tuning t5-small with a custom dataset setting MAX_INPUT_LENGTH = 1024.
But if I try the model like this
from transformers import pipeline
summarizer = pipeline("summarization", model=model, tokenizer=tokenizer, framework="tf")
summarizer(
raw_datasets["test"][0]["original"],
min_length=MIN_TARGET_LENGTH,
max_length=MAX_TARGET_LENGTH,
)
This is the result I got
Token indices sequence length is longer than the specified maximum sequence length for this model (655 > 512). Running this sequence through the model will result in indexing errors
[{'summary_text': 'The Pembina Trail was a 19th century trail used by Métis and European settlers to travel between Fort Garry and Fort Pemmbina in what is now the Canadian province of Manitoba and U.S. state of North Dakota. It was part of the larger Red River Trail network and is now a new version of it is now called the Lord Selkirk and Pembinea Highways in Manitoba. It is important because it allowed people to travel to and from the Red River for social or political reasons.'}]
But Why in above it saying the maximum sequence length for this model is 512 while initially I set it to 1024?
What model and tokenizer are you using?
Hi @seungjun-green, thanks for reporting this.
Could you provide a reproducible colab with the error you're facing to investigate this issue ?
This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.
This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.