backports
backports copied to clipboard
Special character in file name (subtitles)
A special character in a subtitle name (ç in my case) will cause a fatal error of ingest with Doremi servers without mentioning the error.
@liloneum Thanks, noted - shall be fixed.
As for special characters in filenames in general. I put some notes here: https://dcpomatic.com/mantis/view.php?id=2465
Repeating below:
The Interop (UDF) constraints are a bit messy, I think it would be easier to just enforce the SMPTE rules also for interop. Mainly:
- Each path segment shall match [a-zA-Z0-9-_.]
- No path segment shall have more than 100 characters
- The value of the Path element shall not exceed 100 characters in length
- A Path element value shall have no more than 10 segments
References:
SMPTE:
ST 429-9:2014
7.1 Path
The Path element indicates the complete path for the Chunk, represented as a URI per [RFC 3986]. Its semantics and format are delivery-medium dependent, and constrained by each Map Profile (see Section 9). The value is encoded as an xs:anyURI. Note: Annex A presents a basic Map Profile.
Annex A Basic Map Profile v2 (Normative)
A.2 Path
Each Path element value shall be a relative-path reference as specified in RFC 3986. No query or fragment component shall be present. Given a Path element in an Asset Map, the relative-path reference shall be resolved, as specified in RFC 3986, relative to a Base URI consisting of the location of the Asset Map. (...) Each path segment, as specified in IETF RFC 3986, shall consist of characters from the set a-z, A-Z, 0-9, “-“ (dash), “_” (underscore) and “.” (period). No segment shall have more than 100 characters, and the value of the Path element shall not exceed 100 characters in length. A Path element value shall have no more than 10 segments. The Path element value shall preserve case (the path and the filename on the filesystem shall have identical case). No two paths in an Asset Map shall have identical value, regardless of case.
INTEROP:
https://interop-docs.cinepedia.com/Document_Release_2.0/mpeg_ii_am_spec.pdf
6.4 Chunk Path Format
The path and filename shall conform to the UDF specification.
http://www.osta.org/specs/pdf/udf201.pdf
Basic Restrictions & Requirements
File Name Length: Maximum of 255 bytes
4.2.2.1 char FileIdentifier
... [this section with subsections contain quite involved algorithms for translation of "illegal" names to be used on specific OSes]
@matmat thanks for the notes and reminders!
Added checks for outsider chars in AM asset paths. Depending on AM type (SMPTE/Interop) the return will be Error
(SMPTE) and Hint
(Interop), respectively. What do you think? (4c3977cc2b14d55a024a375f133dbde347ede8eb)
Also added a length check for AM asset paths that should have been in there 10 years ago -_-
Looks good, thank you!
In practice I think lots of DCPs will fail this but still play back without probles (in most cases).. But that's how it is I guess.
The festival is coming up and I will battle test this in the coming weeks! :)
If/when you have time these additional checks would be nice to have (but some of them maybe unneccecary..):
- No two paths in an Asset Map shall have identical value, regardless of case.
- Check that the path is relative
- No more than 10 path segments
- Check that it is a valid path according to RFC 3986
- No query component
- No fragment