Fix regression on `Processor.save_pretrained` caused by #31691
What does this PR do?
Fix regression on Processor.save_pretrained caused by https://github.com/huggingface/transformers/pull/31691
tl;dr: a month ago, we made a change that removed "chat_template" from processor_dict when saving a processor. This caused processor_config.json to not get saved at all. See:
processor_dict = self.to_dict()
chat_template = processor_dict.pop("chat_template", None)
if chat_template is not None:
chat_template_json_string = json.dumps({"chat_template": chat_template}, indent=2, sort_keys=True) + "\n"
with open(output_chat_template_file, "w", encoding="utf-8") as writer:
writer.write(chat_template_json_string)
logger.info(f"chat template saved in {output_chat_template_file}")
# For now, let's not save to `processor_config.json` if the processor doesn't have extra attributes and
# `auto_map` is not specified.
if set(processor_dict.keys()) != {"processor_class"}:
self.to_json_file(output_processor_file)
logger.info(f"processor saved in {output_processor_file}")
but we kept these lines as is:
if set(self.to_dict().keys()) == {"processor_class"}:
return []
return [output_processor_file]
So, for a month now, we've not been saving processor_config.json but still returning [processor_config.json] as the saved files. This has caused a test in my other PR (https://github.com/huggingface/transformers/pull/32906) to fail
Who can review?
@amyeroberts @zucchini-nlp
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
I guess that had a tests, but it wasn't triggered when the prev PR was merged. I ran only VLMs when merging the prev PR...
Okay, merging and patching!