transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Pipeline(summarization) code example and documentation needs updating

Open TomBerton opened this issue 1 year ago • 0 comments

System Info

Using Google Colab on Mac OS Ventura 13.2.1 Chrome Version 112.0.5615.137 (Official Build) (x86_64)

Using the install command. !pip install transformers Which downloads the following:

Screenshot 2023-04-28 at 5 53 25 PM

Who can help?

@Narsil

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [x] My own task or dataset (give details below)

Reproduction

In the documentation for the pipeline summarization here the example needs updating. Use the current example below:

# use bart in pytorch summarizer = pipeline("summarization") summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)

Produces the following output in Google Colab. Using a pipeline without specifying a model name and revision in production is not recommended. Your max_length is set to 20, but you input_length is only 11. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=5) [{'summary_text': ' An apple a day, keeps the doctor away from your doctor away, says Dr.'}]

The documentation doesn't state what min_length= and max_length= actually do and the output doesn't tell you either.

  1. Is the max_length the maximum token length of the output or input?
  2. Based on the output from running the code, does the input length affect the output?

Running this code: # use t5 in tf summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="tf") summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)

Produces the following output in Google Colab. . Your max_length is set to 20, but you input_length is only 13. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=6) /usr/local/lib/python3.10/dist-packages/transformers/generation/tf_utils.py:745: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation) warnings.warn( [{'summary_text': 'an apple a day, keeps the doctor away from the doctor .'}]

Expected behavior

  1. Show the expected output by using longer text as the input.
  2. Provide a clear explanation of what min_length= and max_length= actually do.
  3. Avoid warnings when running example code from documentation or specifying a stable version to use.

TomBerton avatar Apr 28 '23 22:04 TomBerton