MarkersExtractor
MarkersExtractor copied to clipboard
Add Clip Keywords manifest field
Add Keywords manifest field. See below.
It is complicated to add Clip Keywords column? We can have this column after Clip Name.
I wonder if we can add the Keywords tagged within the Clip Range from the Marker position?
When Inspecting FCPXML -
<asset-clip ref="r2" offset="0s" name="Nature Makes You Happy" start="550912/12800s" duration="9216/12800s" tcFormat="NDF" audioRole="dialogue">
<adjust-colorConform enabled="1" autoOrManual="manual" conformType="conformNone" peakNitsOfPQSource="1000" peakNitsOfSDRToPQSource="203"/>
<keyword start="99/5s" duration="25088/12800s" value="penguin"/>
<keyword start="1076/25s" duration="9216/12800s" value="flower, nature"/>
<marker start="1082/25s" duration="100/2500s" value="Marker 37"/>
<keyword start="257/5s" duration="6656/12800s" value="penguin"/>
<keyword start="1454/25s" duration="19456/12800s" value="nature"/>
<keyword start="1454/25s" duration="15872/12800s" value="flower"/>
<keyword start="297/5s" duration="3584/12800s" value="flower"/>
</asset-clip>
Seem that the associated Keyword Tags are always above the Marker?
<keyword start="1076/25s" duration="9216/12800s" value="flower, nature"/>
<marker start="1082/25s" duration="100/2500s" value="Marker 37"/>
<keyword start="61s" duration="27648/12800s" value="birds"/>
<marker start="1544/25s" duration="100/2500s" value="Marker 42"/>
In our CSV -
Those values would automatically be converted as Multi-Select types in Notion and Airtable.
User could have another layer of filtering option in their Database.
I did a bit of a deep dive this past week on keywords, so yes. They are already being parsed, just not used yet. In fact I was going to open an Issue to propose adding them to the manifest but haven't had a chance yet.
We can apply similar logic for out-of-bounds keywords as we do with out-of-bounds markers.
Since keywords can apply to an entire clip or a portion of a clip, a bit of extra math is involved. As long as some portion of the keyword's range of a clip is visible from the main timeline, we would include in on the output manifest.
We can place Clip Keywords after Clip Duration.
This was more complex than anticipated since keywords can apply to an entire clip or to a range of a clip.
I've built out the parser logic necessary in DAWFileKit to extract keywords for each marker.
One of two options are possible:
- Respect the keyword ranges. Meaning, only keywords that contain the marker within their range will be extracted for that marker.
- Take all keywords from the marker's clip, ignoring the keyword ranges and pretend all its keywords apply to the entire clip.
My instinct would be to make Option 1 the default behavior, respecting the ranges. In that case, perhaps our manifest field should be called Keywords since it's containing keywords that only apply to the marker and not necessarily its entire clip. Unless it's not obvious that the keywords are coming from the marker's clip.
If you feel Option 2 is the better way to go, then our manifest field could be called Clip Keywords.
Also, this will be implemented in the manifest files the same way Audio Role & Subrole was.
For CSV/TSV it will be a comma-separated string. For JSON, it will be a string array.
Respect the keyword ranges. Meaning, only keywords that contain the marker within their range will be extracted for that marker.
I would prefer Option 1. And I believe advance users of FCP would also prefer this way too, since it would respect the keyword ranges. But I am sure there might be cases where users might prefer Option 2.
Should we have flag called --keyword-range with marker (default) or clip options?
Also, this will be implemented in the manifest files the same way Audio Role & Subrole was.
For CSV/TSV it will be a comma-separated string. For JSON, it will be a string array.
Yeap. Thank you.
Should we have flag called
--keyword-rangewithmarker(default) orclipoptions?
Trying to avoid flag creep if possible. We can add it if there's a strong need for it later.
I'd maybe call it --keywords-source with inRange or allOnClip as arguments.
Trying to avoid flag creep if possible. We can add it if there's a strong need for it later.
Noted!
I posted an alpha release for testing: https://github.com/TheAcharya/MarkersExtractor/releases/tag/0.3.6-alpha1
Please give it a try and check all the profiles to make sure they contain the new manifest field and its contents is correct.
I posted an alpha release for testing: https://github.com/TheAcharya/MarkersExtractor/releases/tag/0.3.6-alpha1
Please give it a try and check all the profiles to make sure they contain the new manifest field and its contents is correct.
Let me test and report back.
Some observations,
@samplue's CSV.
There is a "risk" of having extremely long keywords. Users might have input keywords with long characters. While it is users's choice and workflow, there might be potential issue (when parsing and various edge cases) when uploading keywords with Spaces, , and ..
Truncation occurs (depends on column width) in Notion and Airtable.
Notion
Airtable
There is also some limit for property values within Notion. I can't find the information for Airtable. https://developers.notion.com/reference/request-limits#size-limits
I wonder does it make sense to do the following:
-
Convert all
Spacesand,.to-.
Example: -vfx - cg helicoptertovfx-cg-helicopter.3. free line of fireto3-free-line-of-fireThat way all options listed in Notion or Airtable would be more neat, consistency and easier to sort and also not to clutter the interface. -
Make all of the keywords in lower case. Again for consistency.
-
Limit to 99 keywords.
-
Any keyword that has more than 99 characters will be ignored and not included in the CSV/JSON.
It also encourages users to be more precise and optimal with their keywords.
- Convert all
Spacesand,.to-.
Unless there are known issue(s) with specific illegal characters, users probably don't want us to sanitize the keyword text. MarkersExtractor's job is primarily to extract data, it shouldn't be making a lot of decisions or subjective assumptions on how to format it that result in modification or loss of data.
- Make all of the keywords in lower case. Again for consistency.
- Limit to 99 keywords.
- Any keyword that has more than 99 characters will be ignored and not included.
You're free to alter the data in Marker Data as much as you like of course, but the CLI tool should keep data unmodified as much as possible. These kind of rules are arbitrary and every user will want something different.
Test Library.
I am not sure is this a issue or not.
In this test example - Penguin Keyword is not used for the clip range.
However, Penguin keyword is added.
We should also include Clip Keywords for the --label burn-ins.
We should also include
Clip Keywordsfor the--labelburn-ins.
Certainly possible but there should be a hard character count limit (100?) for it so extremely long keyword lists don't potentially create a huge mess on thumbnail images.
Penguin Keyword is not used for the clip range.
Not sure what's going on there without more in-depth tests.
I also noticed that some keywords are being repeated in the manifest. I should add a de-duplication step to the keyword extraction and sort alphabetically for consistency.
Certainly possible but there should be a hard character count limit (100?) for it so extremely long keyword lists don't potentially create a huge mess on thumbnail images.
Yes. 99 or 100 character limit would be ideal.
Actually, it's already possible. --label clipKeywords. Updated README to include the updated CLI help block with it.
But I will add a max char limit.
I am now trimming leading and trailing whitespace, removing duplicates, and sorting alphabetically. This cleans up the keyword list a lot.
Previous output before the keyword cleanup:
Awesome!
I have a theory regarding the spurious penguin keyword appearing.
In the keyword extraction function, if there is an error when reading a keyword's range, the keyword is included by default as a failsafe. For that first marker, "Yellow Flower", one of the clip regions that has the penguin keyword is outside of the used area of the media (media was trimmed on the timeline). Maybe it's hitting an error there.
Ok I believe I have fixed the spurious keyword extraction.
I posted a new alpha release for testing: https://github.com/TheAcharya/MarkersExtractor/releases/tag/0.3.6-alpha2
I posted a new alpha release for testing: https://github.com/TheAcharya/MarkersExtractor/releases/tag/0.3.6-alpha2
Just tested on couple of timelines, everything seems to be working as expected.