transformers
transformers copied to clipboard
Pipeline(summarization) code example and documentation needs updating
System Info
Using Google Colab on Mac OS Ventura 13.2.1 Chrome Version 112.0.5615.137 (Official Build) (x86_64)
Using the install command.
!pip install transformers
Which downloads the following:

Who can help?
@Narsil
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [x] My own task or dataset (give details below)
Reproduction
In the documentation for the pipeline summarization here the example needs updating. Use the current example below:
# use bart in pytorch
summarizer = pipeline("summarization") summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)
Produces the following output in Google Colab.
Using a pipeline without specifying a model name and revision in production is not recommended. Your max_length is set to 20, but you input_length is only 11. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=5) [{'summary_text': ' An apple a day, keeps the doctor away from your doctor away, says Dr.'}]
The documentation doesn't state what min_length=
and max_length=
actually do and the output doesn't tell you either.
- Is the
max_length
the maximum token length of the output or input? - Based on the output from running the code, does the input length affect the output?
Running this code:
# use t5 in tf
summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="tf") summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)
Produces the following output in Google Colab. .
Your max_length is set to 20, but you input_length is only 13. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=6) /usr/local/lib/python3.10/dist-packages/transformers/generation/tf_utils.py:745: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation) warnings.warn( [{'summary_text': 'an apple a day, keeps the doctor away from the doctor .'}]
Expected behavior
- Show the expected output by using longer text as the input.
- Provide a clear explanation of what
min_length=
andmax_length=
actually do. - Avoid warnings when running example code from documentation or specifying a stable version to use.