feat: add option to disable timing metadata in ASR transcription
Feature: Option to disable timing metadata during ASR transcription
This PR adds a new feature that allows users to disable the printing of timing metadata in ASR (Automatic Speech Recognition) transcription output. By default, timing information like [time: 0.0-2.5] is included in the transcribed text, but users can now opt out of this behavior.
Changes Made:
Core Implementation:
- Added
include_time_metadata: bool = Truefield toInlineAsrOptionsclass indocling/datamodel/pipeline_options_asr_model.py - Modified
_ConversationItem.to_string()method to accept aninclude_time_metadataparameter - Updated both
_NativeWhisperModeland_MlxWhisperModelto respect the new setting
CLI Integration:
- Added
--asr-no-timingflag to disable timing metadata via CLI - The flag is automatically documented through the CLI's auto-generated documentation
Tests:
- Added
test_asr_pipeline_without_time_metadata()- verifies timing metadata can be disabled - Added
test_asr_pipeline_with_time_metadata_default()- verifies timing metadata is enabled by default - Added
test_conversation_item_to_string_with_and_without_time()- unit tests for theto_string()method
Backward Compatibility:
- Default behavior unchanged: timing metadata is included by default
- No breaking changes to existing APIs
Usage Examples
Programmatic:
from docling.datamodel import asr_model_specs
from docling.datamodel.pipeline_options import AsrPipelineOptions
pipeline_options = AsrPipelineOptions()
pipeline_options.asr_options = asr_model_specs.WHISPER_TINY.model_copy(deep=True)
pipeline_options.asr_options.include_time_metadata = False # Disable timing
CLI:
docling audio.mp3 --asr-no-timing
Issue resolved by this Pull Request: Resolves #2564
Screenshot:
Checklist:
- [x] Documentation has been updated
- CLI documentation auto-generates from code (includes new `--asr-no-timing` flag)
- Code includes comprehensive docstrings explaining the feature
- [x] Examples have been added
- The feature is straightforward and covered by tests
- Usage is documented in code comments and test cases
- [x] Tests have been added
- ✅ [test_asr_pipeline_without_time_metadata()] - Integration test for disabled timing
- ✅ [test_asr_pipeline_with_time_metadata_default()]- Verifies default behavior
- ✅ [test_conversation_item_to_string_with_and_without_time()]- Unit tests
- ✅ All tests properly isolated (using [model_copy(deep=True)]
- ✅ No compilation/lint errors
✅ DCO Check Passed
Thanks @akanshajain231999, all your commits are properly signed off. 🎉
Related Documentation
Checked 3 published document(s) in 1 knowledge base(s). No updates required.
Merge Protections
Your pull request matches the following merge protections and will not be merged until they are valid.
🟢 Enforce conventional commit
Wonderful, this rule succeeded.
Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
- [X]
title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:
Hey @ceberam , Can you please review this PR?
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:loudspeaker: Thoughts on this report? Let us know!
@ceberam Thanks for the detailed review. I will wait for your response until next week. Meanwhile, do you have any other issues which I can work on?
@ceberam Thanks for the detailed review. I will wait for your response until next week. Meanwhile, do you have any other issues which I can work on?
@akanshajain231999 you are very welcome to contribute to Docling, this is an open-source collaborative project 🙂 Feel free to pick up an issue and when you are ready to actively work on it, you can set yourself in the Assignees list. Here is a list of issues that I think could be easy wins. Some may be outdated, so please always consider the latest Docling release. 2626, 2515, 2465, 2487, 2476, 2367, 2298, 2351 (this one more ambitious)