MediaInfoLib
MediaInfoLib copied to clipboard
Seek to timecode / duration and set parse duration
Is it possible (or could it be) for MediaInfo to take a parameter as to where to start a seek for metadata and the duration?
We have a lot of trouble detecting captions in our files since the amount of content (slates, preroll, etc) can vary and arbitrarily increasing the parsing duration results in increased loading time with diminishing returns. However, we often have metadata denoting the start time of the actual content, and I think if we were able to somehow tell MediaInfo where to seek to, we'd have a greater chance of correctly identifying the correct metadata.
Not yet possible but doable, just not so easy as it needs to be implemented per container format, and currently seek was implemented years ago for a specific project but not maintained, and it is not so easy with e.g. TS as there is no index so we need to skim the file for finding the indicated start PTS, we need to review the related code and implement the option you want.
An exception to "not yet possible", hidden options for TS only and only duration (no seek) --MpegTs_MaximumOffset=xxxx and --MpegTs_MaximumScanDuration=xxxx.
Primary interest on our end would be for MXF and MOV, which are (usually) thoroughly indexed (in well formed files).
MXF and MOV are not complex for that. We actually already do that for MXF but without option (it is hard coded to the middle of the stream, by byte offset). I suggest classic parsing from beginning in order to avoid any collateral issues by seeking immediately, + seek to the expected timestamp (or timecode) during the expected duration after the initial parsing.
@plondino @joe-sciame-wm we added an option for letting you set the probe place, for MXF (there was already a default probe of 64 MB at 50% of the file) and MP4 (new probe). The default for both MXF and MP4 is now after 30s seek to 50% of the file and check during 30s.
Option is --File_ProbeCaption=(Seek)[+(Duration)][+(format)]
Seek and duration can be a timestamp, a timecode, a byte size, a count of seconds, a percentage (of the file duration).
"(format)" can be empty (=for all), mp4, mxf.
Example: --File_ProbeCaption=00:01:00:00+30s+mp4,50%+64M means seek at 00:01:00:00 then scan during 30s for mp4, seek at 50% then scan during 64 MiB for others.
If you set for MP4 and not for others, the default is used for others.
During development I remarked that scanning caption in MP4 is often (it depends on how captions are muxed in the file, it is fast if it is by block of e.g. 1 second, slow if it is 1 CDP packet after each video frames) fast for MP4 on SSD (it may be worse on S3 due to the latency of seeks), you may want to try to increase the scan duration without suffering too much of analysis duration, maybe up to the full file --File_ProbeCaption=0%+100%+mp4.
I suggest something like --File_ProbeCaption=45%+10%+mp4,45%+120s, it should work for all files without too much performance impact and without having to set a specific timecode per content.
@JeromeMartinez thanks for the update - just to confirm, MP4 includes MOV as well?
MP4 includes MOV as well?
Yes.
I just tested this out on an MXF file I had that previously was not detecting any captions in the default parse. The file starts at 00:00:00;00 and has a long slate, and the first caption displays at 00:01:09;10. I tested with:
mediainfo file.mxf --File_ProbeCaption=00:01:00:00
Sure enough 4 streams were detected. However I experimented with the starting timecode and found that even just setting it to 00:00:00:01 is enough to detect all streams which is odd to me, since I would expected that the captions are too far out to detect under normal usage and I would have to get closer to find actual caption data. I will provide the link to the file, can you shed some light on this?
However I experimented with the starting timecode and found that even just setting it to 00:00:00:01 is enough to detect all streams which is odd to me, since I would expected that the captions are too far out to detect under normal usage and I would have to get closer to find actual caption data. I will provide the link to the file, can you shed some light on this?
There was a rounding issue that lead to a seek with a precision of 1% of the MXF file, so the 00:00:00:01 seek was not really to this exact timecode. Seek improved in https://github.com/MediaArea/MediaInfoLib/pull/2024 and now in dev snapshots and now precision is few seconds (we don't parse the indexes for faster check and we don't really need precision here, let me know if it is not the case for you) and the behavior is the expected one (00:00:00:01 seek, as well as e.g. 00:00:30:00 seek, with your file does not catch the captions because the default scan duration of 30 seconds is used and the first caption is later).
Hi, I know issue is closed, wondering if there is support for other formats in a generic way, I'm specifically looking for MKV container and MPEG2, H264, and H265 support.
Related to ffmpeg dropping support for CC detection, I used to use mediainfo and ffprobe, with ffprobe picking up H264 content, but no more.
wondering if there is support for other formats in a generic way,
The common part is already there, but there is a need to implement seek in every container (because we didn't need to seek in e.g. MKV up to now, so we don't read indexes yet).
I'm specifically looking for MKV container and MPEG2, H264, and H265 support.
It is doable but not the priority in free support, contact us if you wish to sponsor this request.
Thank you, a generic option to increase "probing scope" would be great.
I am still trying to figure out if I can do something using ffprobe and movie with [out0+subcc] or readeia608.
Follow-up, are pipes supported?
Per my testing the size of the file impacts the detection of EIA608 in some video streams, small file detected, large file not detected.
If I could use ffmpeg and pipe a section of the file into mediainfo I can control the size (I can use ffmpeg and pipe a TS video stream into ccextractor and that works, but deploying ccextractor on various platforms is non-trivial).
Follow-up, are pipes supported?
In the meantime pipes were implemented for a need we had (DV input), not 100% it would work well with anything else.
small file detected, large file not detected.
The current implementation is to test the first 30 seconds.
Note that a sponsor is interested in probing in the middle of a MP4/AVC file (as we already do for MXF or MP4 with a dedicated caption track) so it will be implemented very soon, but not in TS until a current sponsor is interested in it or someone else is (really) interested in it.
By the way, the option " --File_EIA608_DisplayEmptyStream=1" may be interesting for you, it forces MediaInfo to display the EIA-608 stream even if no actual caption is found (only the EIA-608 stream, even if empty).
I have an MXF file that has is exhibiting some odd behavior with the caption probe / strip empty options.
I know the file has English captions on 608-CC1 and 708-S1 and Spanish captions on 608-CC3 and 708-S2 (verified in Telestream Switch). I believe there are EDM commands on some of the unused 708 services (708-S3 - 708-S6).
If I use the --File_CommandOnlyMeansEmpty option, I still get 8 text tracks even though there is nothing on S3-S6. Also, if I use the --File_ProbeCaption=01:00:00:00 I get three streams 608-CC1, 608-CC3, and 708-S1, even though the MXF starts at 01:00:00;00 and I know there are captions on S2 just 5 seconds into the start. If I use the --File_CommandOnlyMeansEmpty in conjunction with the caption probe, I only get 708-S1.
Here are some experiments I ran:
| Params | Detected |
|---|---|
| CC1,CC3,S1,S2,S3,S4,S5,S6 | |
| --File_CommandOnlyMeansEmpty | CC1,CC3,S1,S2,S3,S4,S5,S6 |
| --File_ProbeCaption=01:00:00:00 | CC1,CC3,S1 |
| --File_ProbeCaption=01:00:00:00 --File_CommandOnlyMeansEmpty | S1 |
| --File_ProbeCaption=1403s | CC1,CC3,S1,S2,S3,S4,S5,S6 |
| --File_ProbeCaption=1404s | CC1,CC3,S1,S3,S4,S5,S6 |
| --File_ProbeCaption=1405s | CC1,CC3,S1 |
Can you shed some light on what's going on? I am attaching the 64kb header but I assume you will need the whole file to fully test, I will provide a link via email.
even though there is nothing on S3-S6
For that, it is due to not flagging well "content" in 708, there is service data in S3-S6 but no content, I have a fix for that on the way
even though the MXF starts at 01:00:00;00
The option does not manage MXF timecodes (which one to use? There are so many time codes in MXF, a bit everywhere, MXF time code track, material vs source, ANC track, SDTI, system Scheme 1 time code...), so the start is at 00:00:00;00, "01:00:00;00" lead to after the end of the file so only the beginning of the file is checked (hard coded, can not be removed for the moment).
do you need to be able to provide a timecode seek based on first material package time code, first source package time code, first SDTI time code?
We usually use the material package timecode, or the timecode of whichever package the primary package property of the file points to, but I can see how this could be confusing without an option to specify.
Does MOV/MP4 use the QuickTime timecode track? So if the above file was an MOV that started at 01:00:00;00 it would start at the beginning of the file, not the end?
timestamp, a timecode, a byte size, a count of seconds, a percentage (of the file duration)
Are seconds values absolute in the file? Since we process a lot of both MXF and MOV, I am thinking that to ensure we're hitting the start of content we can do (Start of Content TC - Start of File TC) to seconds for both cases.
We usually use the material package timecode, or the timecode of whichever package the primary package property of the file points to, but I can see how this could be confusing without an option to specify.
Exactly, at least by default we prefer to avoid to use timecodes for this option, we use a 00:00:00:00 based value, for the moment you need to parse a first time the file then peek the timecode you want and do the difference from the timecode you want to seek to in another call, if you don't know the start time code.
Does MOV/MP4 use the QuickTime timecode. So if the above file was an MOV that started at 01:00:00;00 it would start at the beginning of the file, not the end?
Similar to MXF, we don't use timecode tracks for the moment (which one would we use if you have 2 time code tracks in your file?).
Are seconds values absolute in the file? Since we process a lot of both MXF and MOV, I am thinking that to ensure we're hitting the start of content we can do (Start of Content TC - Start of File TC) to seconds for both cases.
If we agree that "seconds values absolute" means that the value of the first frame is 0 second, yes.
Timecode works as well, as long as it's also absolute (i.e. first frame of media is 00:00:00:00), correct? Is drop/nondrop factored in at all in offset from 00:00:00:00? I assume not since that would be in the start timecode for MXF and MOV.
Timecode works as well, as long as it's also absolute (i.e. first frame of media is 00:00:00:00)
Timecode works as long as it is also absolute, as we don't have the reference for which timecode we should use.
Is drop/nondrop factored in at all in offset from 00:00:00:00? I assume not since that would be in the start timecode for MXF and MOV.
We convert the timecode to a time based on the drop/nondrop provided. But even if we don't do that, it is not a big deal, the diff is about few milliseconds, not the most important. We also don't seek at the exact frame number because we don't know the frame rate when we seek, we seek to the timecode without the frame number (so always lower that what is provided), quicker to implement without any impact in practice.