detected mime type of m4a uploaded in zip
What this PR does / why we need it:
store the proper mime type (and show the previewer) for m4a files uploaded in a zip
Which issue(s) this PR closes:
- Closes #10922
Special notes for your reviewer:
Suggestions on how to test this:
See issue
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
files with extension m4a uploaded in a zip will get content-type audio/x-m4a rather than application/octet-stream
Additional documentation:
Huh, the API tests passed but afterward there was an Ansible error at https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10923/1/console
Ansible error
Branch 10922-content-type-of-m4a-in-zip from https://github.com/DANS-KNAW-jp/dataverse has been deployed to http://ec2-3-95-195-223.compute-1.amazonaws.com/ destroying AWS instance { "TerminatingInstances": [ { "CurrentState": { "Code": 32, "Name": "shutting-down" }, "InstanceId": "i-05db8c82e47c5a34f", "PreviousState": { "Code": 16, "Name": "running" } } ] } removing EC2 PEM [Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (Post) [Pipeline] junit Recording test results [Checks API] No suitable checks publisher found. [Pipeline] jacoco [JaCoCo plugin] Collecting JaCoCo coverage data... [JaCoCo plugin] Version: 3.3.6 [JaCoCo plugin] target/.exec;target/classes;src/main/java; locations are configured [JaCoCo plugin] Number of found exec files for pattern target/.exec: 1 [JaCoCo plugin] Saving matched execfiles: /home/worker/workspace/SS-Dataverse-Develop-PR_PR-10923/target/jacoco-unit.exec [JaCoCo plugin] Saving matched class directories for class-pattern: target/classes: [JaCoCo plugin] - /home/worker/workspace/SS-Dataverse-Develop-PR_PR-10923/target/classes 1124 files [JaCoCo plugin] Saving matched source directories for source-pattern: src/main/java: [JaCoCo plugin] Source Inclusions: /*.java,/.groovy,**/.kt,**/.kts [JaCoCo plugin] Source Exclusions: [JaCoCo plugin] - /home/worker/workspace/SS-Dataverse-Develop-PR_PR-10923/src/main/java 919 files [JaCoCo plugin] Loading inclusions files.. [JaCoCo plugin] inclusions: [] [JaCoCo plugin] exclusions: [src/test] [JaCoCo plugin] Thresholds: JacocoHealthReportThresholds [minClass=0, maxClass=0, minMethod=0, maxMethod=0, minLine=0, maxLine=0, minBranch=0, maxBranch=0, minInstruction=0, maxInstruction=0, minComplexity=0, maxComplexity=0] [JaCoCo plugin] Publishing the results.. [JaCoCo plugin] Loading packages.. [JaCoCo plugin] Done. [JaCoCo plugin] Overall coverage: class: 44.10933, method: 24.452742, line: 20.689976, branch: 16.892052, instruction: 21.34378, complexity: 16.780916 [Pipeline] script [Pipeline] { [Pipeline] fileExists [Pipeline] error [Pipeline] } [Pipeline] // script [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // withEnv [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline ERROR: Ansible run terminated abnormally, failing build.
GitHub has been notified of this commit’s build result
Finished: FAILURE
It seems unrelated to this PR. Thanks for the PR, @jo-pol
@jo-pol - given https://mimetype.io/audio/x-m4a - should this use audio/mp4 instead? I'm not certain, but I think using audio/mp4 would make files uploaded as audio/x-m4a end up with the audio/mp4 type. (changing existing files would require running the redetect API). I see we have both in the MimeTypeDisplay file: https://github.com/IQSS/dataverse/blob/9339e4492732f65771f026ce2613935a1ef52e8f/src/main/java/propertyFiles/MimeTypeDisplay.properties#L204-L205 but we currently don't have an example with audio/mp4 for the Previewers.
Also note that the jenkins build problem here is the same as on #10899 - somehow the version of count.pl being picked up by ansible is different that the one in the IQSS/develop branch:
TASK [dataverse : place Leonid's PERL scripts] *********************************
changed: [localhost] => (item=https://raw.githubusercontent.com/DANS-KNAW-jp/dataverse/develop/scripts/database/querycount/parse.pl)
changed: [localhost] => (item=https://raw.githubusercontent.com/DANS-KNAW-jp/dataverse/develop/scripts/database/querycount/count.pl)
TASK [dataverse : get name of Postgres logfile] ********************************
ok: [localhost]
TASK [dataverse : there should be only one] ************************************
ok: [localhost]
TASK [dataverse : fire at will] ************************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "/usr/bin/perl count.pl --non-interactive /var/lib/pgsql/13/data/log/postgresql-Mon.log > /tmp/query_count.out", "delta": "0:00:00.045359", "end": "2024-10-14 10:54:38.761200", "msg": "non-zero return code", "rc": 2, "start": "2024-10-14 10:54:38.715841", "stderr": "usage: ./count.pl <PGLOGFILE>", "stderr_lines": ["usage: ./count.pl <PGLOGFILE>"], "stdout": "", "stdout_lines": []}
Re - the build failure - it looks like this is from a bug in dataverse-ansible and doesn't have anything to do with your branches. We're working on a fix but won't hold up the PR for that once we resolve the audio/mp4 x-m4a issue.
Re (x-)m4a When uploaded stand alone firefox provides mp4, chrome and edge x-m4a I simply followed the major browsers. However, https://mimetype.io/audio/x-m4a seems pretty clear.
Having said that and beyond the scope of the issue: shouldn't we overrule the browser provided mime type in FileUtil.useRecognizedType for deprecated types?
A second aside: it looks to me that CreateNewDataFilesCommand does a pretty inefficient job scanning a file multiple times to recognize the type and throwing that result away to use the supplied type.
How about adding both with a comment that the x- version is obsolete but still used in 2024? That would support all browser now (assuming people register the player for both types).
Re: shouldn't we overrule the browser provided mime type in FileUtil.useRecognizedType for deprecated types? - probably, the note on useRecognizedType indicates this was thought about (in general, not specific to obsolete types) and just not implemented so far.
Re: it looks to me that CreateNewDataFilesCommand does a pretty inefficient job scanning a file multiple times to recognize the type and throwing that result away to use the supplied type. - It's hard to tell, but I think any given file is only scanned once, but there are different paths through that command, so there are different places where we recognize normally uploaded files, direct uploaded files, files in zips, shapefile components, etc. It does look like we want to always try recognizing once, since we could discover the real type is one of the ingestible ones even though the browser says otherwise.
Re: add both Both deprecated mime-types are added both to dataverse-previewers.sh with https://github.com/gdcc/dataverse-ansible/pull/375/files. That supports browsers. But my question (beyond the scope of this PR) is: why don't we overrule deprecated values passed to us by browsers.
Re: scanned once detectTabularDataFormat, checkNetcdfOrHdf5 and JHove all read the file (and the JHove configuration is also read) before examining the extension or browser provide type. The other way around seems more efficient to me. But also this is beyond the scope.
Fix looks good. Merging PR. Testing of 10923.docx