MarkersExtractor icon indicating copy to clipboard operation
MarkersExtractor copied to clipboard

Add Clip Keywords manifest field

Open IAmVigneswaran opened this issue 1 year ago • 24 comments

Add Keywords manifest field. See below.

IAmVigneswaran avatar Nov 18 '23 14:11 IAmVigneswaran

It is complicated to add Clip Keywords column? We can have this column after Clip Name.

I wonder if we can add the Keywords tagged within the Clip Range from the Marker position?

Marker-Keyword-01 Marker-Keyword-02

When Inspecting FCPXML -

  <asset-clip ref="r2" offset="0s" name="Nature Makes You Happy" start="550912/12800s" duration="9216/12800s" tcFormat="NDF" audioRole="dialogue">
                            <adjust-colorConform enabled="1" autoOrManual="manual" conformType="conformNone" peakNitsOfPQSource="1000" peakNitsOfSDRToPQSource="203"/>
                            <keyword start="99/5s" duration="25088/12800s" value="penguin"/>
                            <keyword start="1076/25s" duration="9216/12800s" value="flower, nature"/>
                            <marker start="1082/25s" duration="100/2500s" value="Marker 37"/>
                            <keyword start="257/5s" duration="6656/12800s" value="penguin"/>
                            <keyword start="1454/25s" duration="19456/12800s" value="nature"/>
                            <keyword start="1454/25s" duration="15872/12800s" value="flower"/>
                            <keyword start="297/5s" duration="3584/12800s" value="flower"/>
                        </asset-clip>

Seem that the associated Keyword Tags are always above the Marker?

                            <keyword start="1076/25s" duration="9216/12800s" value="flower, nature"/>
                            <marker start="1082/25s" duration="100/2500s" value="Marker 37"/>

                            <keyword start="61s" duration="27648/12800s" value="birds"/>
                            <marker start="1544/25s" duration="100/2500s" value="Marker 42"/>

In our CSV -

CSV-With Keywords

Those values would automatically be converted as Multi-Select types in Notion and Airtable.

User could have another layer of filtering option in their Database.

IAmVigneswaran avatar Nov 25 '23 01:11 IAmVigneswaran

I did a bit of a deep dive this past week on keywords, so yes. They are already being parsed, just not used yet. In fact I was going to open an Issue to propose adding them to the manifest but haven't had a chance yet.

We can apply similar logic for out-of-bounds keywords as we do with out-of-bounds markers.

Since keywords can apply to an entire clip or a portion of a clip, a bit of extra math is involved. As long as some portion of the keyword's range of a clip is visible from the main timeline, we would include in on the output manifest.

orchetect avatar Nov 25 '23 01:11 orchetect

We can place Clip Keywords after Clip Duration.

Clip Keywords

IAmVigneswaran avatar Mar 04 '24 00:03 IAmVigneswaran

This was more complex than anticipated since keywords can apply to an entire clip or to a range of a clip.

I've built out the parser logic necessary in DAWFileKit to extract keywords for each marker.

One of two options are possible:

  1. Respect the keyword ranges. Meaning, only keywords that contain the marker within their range will be extracted for that marker.
  2. Take all keywords from the marker's clip, ignoring the keyword ranges and pretend all its keywords apply to the entire clip.

My instinct would be to make Option 1 the default behavior, respecting the ranges. In that case, perhaps our manifest field should be called Keywords since it's containing keywords that only apply to the marker and not necessarily its entire clip. Unless it's not obvious that the keywords are coming from the marker's clip.

If you feel Option 2 is the better way to go, then our manifest field could be called Clip Keywords.

orchetect avatar Apr 25 '24 06:04 orchetect

Also, this will be implemented in the manifest files the same way Audio Role & Subrole was.

For CSV/TSV it will be a comma-separated string. For JSON, it will be a string array.

orchetect avatar Apr 25 '24 06:04 orchetect

Respect the keyword ranges. Meaning, only keywords that contain the marker within their range will be extracted for that marker.

I would prefer Option 1. And I believe advance users of FCP would also prefer this way too, since it would respect the keyword ranges. But I am sure there might be cases where users might prefer Option 2.

Should we have flag called --keyword-range with marker (default) or clip options?

Also, this will be implemented in the manifest files the same way Audio Role & Subrole was.

For CSV/TSV it will be a comma-separated string. For JSON, it will be a string array.

Yeap. Thank you.

IAmVigneswaran avatar Apr 25 '24 06:04 IAmVigneswaran

Should we have flag called --keyword-range with marker (default) or clip options?

Trying to avoid flag creep if possible. We can add it if there's a strong need for it later.

I'd maybe call it --keywords-source with inRange or allOnClip as arguments.

orchetect avatar Apr 25 '24 06:04 orchetect

Trying to avoid flag creep if possible. We can add it if there's a strong need for it later.

Noted!

IAmVigneswaran avatar Apr 25 '24 06:04 IAmVigneswaran

I posted an alpha release for testing: https://github.com/TheAcharya/MarkersExtractor/releases/tag/0.3.6-alpha1

Please give it a try and check all the profiles to make sure they contain the new manifest field and its contents is correct.

orchetect avatar Apr 25 '24 06:04 orchetect

I posted an alpha release for testing: https://github.com/TheAcharya/MarkersExtractor/releases/tag/0.3.6-alpha1

Please give it a try and check all the profiles to make sure they contain the new manifest field and its contents is correct.

Let me test and report back.

IAmVigneswaran avatar Apr 25 '24 06:04 IAmVigneswaran

Some observations,

@samplue's CSV.

Clip Keywords

There is a "risk" of having extremely long keywords. Users might have input keywords with long characters. While it is users's choice and workflow, there might be potential issue (when parsing and various edge cases) when uploading keywords with Spaces, , and ..

Truncation occurs (depends on column width) in Notion and Airtable.

Notion Notion-Long-Keywords

Airtable Airtable-Long-Keywords

There is also some limit for property values within Notion. I can't find the information for Airtable. https://developers.notion.com/reference/request-limits#size-limits


I wonder does it make sense to do the following:

  1. Convert all Spaces and , . to -.
    Example: - vfx - cg helicopter to vfx-cg-helicopter. 3. free line of fire to 3-free-line-of-fire That way all options listed in Notion or Airtable would be more neat, consistency and easier to sort and also not to clutter the interface.

  2. Make all of the keywords in lower case. Again for consistency.

  3. Limit to 99 keywords.

  4. Any keyword that has more than 99 characters will be ignored and not included in the CSV/JSON.

It also encourages users to be more precise and optimal with their keywords.

IAmVigneswaran avatar Apr 25 '24 17:04 IAmVigneswaran

  1. Convert all Spaces and , . to -.

Unless there are known issue(s) with specific illegal characters, users probably don't want us to sanitize the keyword text. MarkersExtractor's job is primarily to extract data, it shouldn't be making a lot of decisions or subjective assumptions on how to format it that result in modification or loss of data.

  1. Make all of the keywords in lower case. Again for consistency.
  2. Limit to 99 keywords.
  3. Any keyword that has more than 99 characters will be ignored and not included.

You're free to alter the data in Marker Data as much as you like of course, but the CLI tool should keep data unmodified as much as possible. These kind of rules are arbitrary and every user will want something different.

orchetect avatar Apr 25 '24 19:04 orchetect

Test Library.

Keywrods-Test.fcpbundle.zip

I am not sure is this a issue or not.

In this test example - Penguin Keyword is not used for the clip range.

Keyword-Test-01

However, Penguin keyword is added.

Keyword-Test-02

IAmVigneswaran avatar Apr 26 '24 00:04 IAmVigneswaran

We should also include Clip Keywords for the --label burn-ins.

IAmVigneswaran avatar Apr 26 '24 16:04 IAmVigneswaran

We should also include Clip Keywords for the --label burn-ins.

Certainly possible but there should be a hard character count limit (100?) for it so extremely long keyword lists don't potentially create a huge mess on thumbnail images.

orchetect avatar May 01 '24 05:05 orchetect

Penguin Keyword is not used for the clip range.

Not sure what's going on there without more in-depth tests.

I also noticed that some keywords are being repeated in the manifest. I should add a de-duplication step to the keyword extraction and sort alphabetically for consistency.

orchetect avatar May 01 '24 06:05 orchetect

Certainly possible but there should be a hard character count limit (100?) for it so extremely long keyword lists don't potentially create a huge mess on thumbnail images.

Yes. 99 or 100 character limit would be ideal.

IAmVigneswaran avatar May 01 '24 06:05 IAmVigneswaran

Actually, it's already possible. --label clipKeywords. Updated README to include the updated CLI help block with it.

But I will add a max char limit.

orchetect avatar May 01 '24 06:05 orchetect

keyword-formatting

I am now trimming leading and trailing whitespace, removing duplicates, and sorting alphabetically. This cleans up the keyword list a lot.

Previous output before the keyword cleanup:

keyword-formatting-old

orchetect avatar May 01 '24 06:05 orchetect

Awesome!

IAmVigneswaran avatar May 01 '24 07:05 IAmVigneswaran

I have a theory regarding the spurious penguin keyword appearing.

In the keyword extraction function, if there is an error when reading a keyword's range, the keyword is included by default as a failsafe. For that first marker, "Yellow Flower", one of the clip regions that has the penguin keyword is outside of the used area of the media (media was trimmed on the timeline). Maybe it's hitting an error there.

orchetect avatar May 01 '24 07:05 orchetect

Ok I believe I have fixed the spurious keyword extraction.

keywords-csv

orchetect avatar May 01 '24 07:05 orchetect

I posted a new alpha release for testing: https://github.com/TheAcharya/MarkersExtractor/releases/tag/0.3.6-alpha2

orchetect avatar May 01 '24 07:05 orchetect

I posted a new alpha release for testing: https://github.com/TheAcharya/MarkersExtractor/releases/tag/0.3.6-alpha2

Just tested on couple of timelines, everything seems to be working as expected.

IAmVigneswaran avatar May 01 '24 10:05 IAmVigneswaran