transformers
transformers copied to clipboard
Whisper do_sample through generation_config and generate() give different results
System Info
-
transformers
version: 4.38.1 - Platform: Linux-5.15.0-1049-aws-x86_64-with-glibc2.31
- Python version: 3.11.6
- Huggingface_hub version: 0.21.3
- Safetensors version: 0.4.2
- Accelerate version: 0.24.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
@sanchit-gandhi
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
I am running Whisper
on a longform audio.
Below is the script I run where I use the generation_config
to add the do_sample
argument.
from __future__ import annotations
import copy
import os
import numpy as np
import numpy.typing as npt
import torch
from transformers import (
WhisperForConditionalGeneration,
WhisperProcessor,
)
from transformers.generation.configuration_utils import GenerationConfig
from transformers.generation.utils import GenerateBeamEncoderDecoderOutput
from transformers.pipelines.audio_utils import ffmpeg_read
from transformers import set_seed
set_seed(10)
MODEL: str = "openai/whisper-large-v3"
DEVICE: str = "cuda" if torch.cuda.is_available() else "cpu"
TORCH_DTYPE: torch.dtype = torch.float16 if torch.cuda.is_available() else torch.float32
# Load the processor.
processor: WhisperProcessor = WhisperProcessor.from_pretrained(
pretrained_model_name_or_path=MODEL,
torch_dtype=TORCH_DTYPE,
)
# Load the model.
self.model: WhisperForConditionalGeneration = (
WhisperForConditionalGeneration.from_pretrained(
pretrained_model_name_or_path=MODEL,
torch_dtype=TORCH_DTYPE,
)
)
model.to(DEVICE)
# Load the audio as bytes.
with open(input_path, "rb") as f:
input_bytes: bytes = f.read()
# Convert to numpy array.
inputs: npt.NDArray[np.float32] = ffmpeg_read(
bpayload=input_bytes,
sampling_rate=processor.feature_extractor.sampling_rate,
)
# Process the audio into chunks.
processed: dict[str, torch.Tensor] = processor.feature_extractor(
raw_speech=inputs,
truncation=False,
return_attention_mask=True,
padding="longest",
sampling_rate=processor.feature_extractor.sampling_rate,
return_tensors="pt",
)
processed = {
k: v.to(device=DEVICE, dtype=TORCH_DTYPE)
for k, v in processed.items()
}
# Generation config.
generation_config: GenerationConfig = copy.deepcopy(
model.generation_config
)
generation_config.do_sample = True
generation_config.num_beams = 5
generation_config.condition_on_prev_tokens = True
generation_config.logprob_threshold = -1.0
generation_config.return_dict_in_generate = True
# Create the generate args.
generate_args: dict[str, Any] = {
"generation_config": generation_config,
"task": "transcribe",
"language": "english",
"temperature": (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
"return_segments": True,
}
# Run the model.
model_output: GenerateBeamEncoderDecoderOutput | dict[str, Any] = (
model.generate(**processed, **generate_args)
)
# Decode the model ouputs.
text: str
optional: dict[str, Any]
text, optional = processor.tokenizer._decode_asr(
model_outputs=[{"tokens": model_output["sequences"]}],
return_timestamps=False,
return_language="english",
time_precision=processor.feature_extractor.chunk_length
/ model.config.max_source_positions,
)
The above produces one output. If instead of using generation_config
I pass the arguments into the generate()
function, e.g.
# Create the generate args.
generate_args: dict[str, Any] = {
"task": "transcribe",
"language": "english",
"temperature": (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
"return_segments": True,
"do_sample": True,
"num_beams": 5,
"condition_on_prev_tokens": True,
"logprob_threshold": -1.0,
"return_dict_in_generate": True,
}
# Run the model.
model_output: GenerateBeamEncoderDecoderOutput | dict[str, Any] = (
model.generate(**processed, **generate_args)
)
I get a different output. Which given the seed is the same and nothing is changing shouldn't happen.
Experimenting with the arguments I have isolated the issue to the do_sample
argument. Passing it through generation_config
and generate()
gives different results.
Expected behavior
Should get same result from above.
Gentle ping @sanchit-gandhi @ylacombe
Thanks @udeepam for the clear reproducer - could you take a look @kamilakesbi?
Hi @udeepam, thanks for this issue and the clear reproducer!
On the latest version of the main branch ( transformers 4.40.0.dev0
), I get the same results with and without generation_config
, suggesting that this is already solved. I've run the following tests to compare the two results:
assert torch.equal(model_output['sequences'], model_output2['sequences'])
for i in range(len(model_output['segments'][0])):
assert torch.equal(model_output['segments'][0][i]['tokens'], model_output2['segments'][0][i]['tokens'])
assert torch.equal(model_output['segments'][0][i]['start'], model_output2['segments'][0][i]['start'])
assert torch.equal(model_output['segments'][0][i]['end'], model_output2['segments'][0][i]['end'])
assert torch.equal(model_output['segments'][0][-1]['result']['sequences'], model_output2['segments'][0][-1]['result']['sequences'])
assert torch.equal(model_output['segments'][0][-1]['result']['sequences_scores'], model_output2['segments'][0][-1]['result']['sequences_scores'])
assert torch.equal(model_output['segments'][0][-1]['result']['beam_indices'], model_output2['segments'][0][-1]['result']['beam_indices'])
assert model_output['segments'][0][-1]['result']['past_key_values'] == model_output2['segments'][0][-1]['result']['past_key_values']
for i in range(len(model_output['segments'][0][-1]['result']['scores'])):
assert torch.equal(model_output['segments'][0][-1]['result']['scores'][i], model_output2['segments'][0][-1]['result']['scores'][i])
cc @amyeroberts @sanchit-gandhi
Thanks @kamilakesbi! As a quick tip @udeepam you can get the latest version of Transformers by installing from source or with an editable instal.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.