add shn/shorten support -- duration
Yes, I know it is 2025, but, you know, hippies*... There are still .shn files on Lossless Legs and Internet Archive's Live Music Archive; alot of stuff is still being [ethically] shared and doesn't have flac transcodes.
pwd: /library/torrent/downloads/2025/jgb1978-03-18L.sbd.serafin.7326.sbefail.shnf
$ ll
.rw-rw-r-- 841 15 Feb 11:58 Garcia1978-03-18.txt
.rw-rw-r-- 280 15 Feb 11:58 jgb78-03-18-D1.md5
.rw-rw-r-- 336 15 Feb 11:58 jgb78-03-18-D2.md5
.rw-rw-r-- 841 15 Feb 11:58 jgb78-03-18_Time_Sheet.txt
.rw-rw-r-- 9.9k 15 Feb 11:58 jgb78-03-18_sssb.txt
.rw-rw-r-- 25M 15 Feb 11:58 jgb78-03-18d2t03.shn
.rw-rw-r-- 21M 15 Feb 11:58 jgb78-03-18d2t05.shn
.rw-rw-r-- 60M 15 Feb 11:58 jgb78-03-18d1t02.shn
.rw-rw-r-- 76M 15 Feb 11:58 jgb78-03-18d2t04.shn
.rw-rw-r-- 78M 15 Feb 11:58 jgb78-03-18d1t04.shn
.rw-rw-r-- 15M 15 Feb 11:58 jgb78-03-18d2t06.shn
.rw-rw-r-- 37M 15 Feb 11:58 jgb78-03-18d1t05.shn
.rw-rw-r-- 82M 15 Feb 11:58 jgb78-03-18d1t01.shn
.rw-rw-r-- 51M 15 Feb 11:58 jgb78-03-18d2t01.shn
.rw-rw-r-- 40M 15 Feb 11:58 jgb78-03-18d2t02.shn
.rw-rw-r-- 56M 15 Feb 11:58 jgb78-03-18d1t03.shn
[Guessing this goes in the Lib rather than in the Bin repo.]
* Can't even get them to use 4-digit-year timestamps.
sample file please, then we see when the core team has time and/or someone else is interested.
FFmpeg parser for it: https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/shortendec.c
sample file please, then we see when the core team has time and/or someone else is interested.
https://archive.org/details/gd78-11-24.sbd.prefm.13948.sbefail.shnf
https://archive.org/download/gd78-11-24.sbd.prefm.13948.sbefail.shnf/GD-11-24-78-D1-T01.shn (Jack Straw: "We can share the women; we can share the wine...")
FFmpeg parser for it: https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/shortendec.c
huh. this would at least obviate the need for something in addition to mplayer/mediainfo/ffprobe.
FFmpeg parser for it: https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/shortendec.c
FWIW, I'm primarily interested in duration and, after recompiling ffprobe/ffmpeg with
--enable-decoder=shorten \
--enable-demuxer=shorten
there's still no duration listed:
Input #0, shn, from '/Library/torrent/downloads/2025/jgb1978-03-18L.sbd.serafin.7326.sbefail.shnf/jgb78-03-18d1t01.shn':
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0: Audio: shorten, 44100 Hz, 2 channels, s16p
The only thing I've been able to find is shntool:
$ shntool len -q /library/torrent/downloads/2025/jgb1978-03-18L.sbd.serafin.7326.sbefail.shnf/jgb78-03-18d1t01.shn
length expanded size cdr WAVE problems fmt ratio filename
13:07.24 138882712 B -b- -- ---xx shn 0.5899 /library/torrent/downloads/2025/jgb1978-03-18L.sbd.serafin.7326.sbefail.shnf/jgb78-03-18d1t01.shn
13:07.24 138882712 B 0.5899 (1 file)
If there is a specification document for this format then it'll be easier to implement in MediaInfo.
If there is a specification document for this format then it'll be easier to implement in MediaInfo.
I don't know if there's a specification document that would still be available, but there is this: https://github.com/bayun/shntool/blob/master/src/core_mode.c
From what I've been able to hack around with this morning, the existing code may be portable from the shntool source.
So I see that MediaInfo currently only has limited detection for shn here:
https://github.com/MediaArea/MediaInfoLib/blob/2a86cf5eba9f93f643b2f569d7b2d8bb2e26a161/Source/MediaInfo/File_Other.cpp#L168-L179
If you can do C++ programing then the fastest way to get MediaInfo to support shn is to do it yourself and submit a contribution via pull request (PR) or else it may take years if no one does it especially a rare format like this. If want to add full support with lots of code then it is better to have a dedicated parser for it. I can guide you if you need help.
Here is an example of WebP being moved from File_Other.cpp into it's own parser and full support added.
https://github.com/MediaArea/MediaInfoLib/commit/3c78c35a10c238971b682d66a81200d5ba466c51
As much as I would like to do that, I really don't know anything other than bash. I was thinking that I might be able to extract enough of the code from shntool to get info & length but before seriously doing anything in that vein, I would need clarification as to license issues for using the GPL-2.0-licensed code in mediainfo.
I would need clarification as to license issues for using the GPL-2.0-licensed code in mediainfo
Except for small external libraries (utilities, not a parser), we require copyright assignment so we don't accept external code, especially not if the license mandates the license of the final binary.
Here the external code would be the "spec", no more, and MediaInfo coding style (especially the MediaTrace feature) is required..
I'm not a lawyer but I believe GPL code cannot be used in any project that is not GPL.
License issues aside, parsing should be using MediaInfo's style and using internal functions integrated with existing flow so I do not think one can simply copy paste some external code and expect it to work.
Here's the 'bitstream spec' for the header part written by Google Gemini from FFmpeg codes. I did not check the accuracy.
Shorten Bitstream Header Specification
🎧 Shorten Bitstream Header Specification
The demuxer checks for a specific signature and then reads parameters using a variation of Unsigned Rice (UR) codes defined for the Shorten format, implemented by the function get_ur_golomb_shorten.
File Signature
The first 4 bytes must match the magic number:
- Bytes 0-3: 0x616a6b67 (ASCII for 'ajkg') in big-endian (
AV_RB32). This acts as the file identifier.
💡 Understanding get_ur_golomb_shorten(k)
The get_ur_golomb_shorten(gb, k) function reads a single unsigned integer $N$ from the bitstream.
The value $N$ is decoded by:
- Counting the number of preceding zeroes ($q$) until a one ('1') is encountered (the unary part).
- Reading the next $k$ bits ($r$) (the remainder part).
The decoded value is $N = (q \cdot 2^k) + r$.
Version 0 Header
If the fifth byte is 0 (i.e., version == 0), the header follows this structure:
| Field | Decoding Method | Resulting Value | Description |
|---|---|---|---|
| Version | 8 bits (Byte 4) | 0 | Must be 0. |
| Internal Ftype | get_ur_golomb_shorten(k=4) |
$N$ | Determines the internal file format/encoding type. |
| Channels | get_ur_golomb_shorten(k=0) |
$N$ | The number of audio channels. |
| Blocksize | Constant | 256 | Fixed to 256. |
- Note: The header parsing for version 0 starts reading bits from the byte after the Version byte (p->buf + 5).
Version > 0 Header
If the fifth byte is greater than 0 (version > 0), the header follows a slightly more complex structure where the parameter's Golomb code $k$ is also encoded:
| Field | Decoding Method | Resulting Value | Description |
|---|---|---|---|
| Version | 8 bits (Byte 4) | $> 0$ | Any value $> 0$. |
| Internal Ftype $k_{val}$ | get_ur_golomb_shorten(k=2) |
$k_{ftype}$ | The k value used for the next field. Must be $\le 31$. |
| Internal Ftype | `get_ur_golomb_shorten(k=k_{ftype})$ | $N$ | Determines the internal file format/encoding type. |
| Channels $k_{val}$ | get_ur_golomb_shorten(k=2) |
$k_{chan}$ | The k value used for the next field. |
| Channels | `get_ur_golomb_shorten(k=k_{chan})$ | $N$ | The number of audio channels. |
| Blocksize $k_{val}$ | get_ur_golomb_shorten(k=2) |
$k_{block}$ | The k value used for the next field. |
| Blocksize | `get_ur_golomb_shorten(k=k_{block})$ | $N$ | The block size used for encoding. |
- Note: The header parsing for version $> 0$ also starts reading bits from the byte after the Version byte (p->buf + 5).
Validation Constraints
The demuxer performs validity checks on the parsed parameters:
- Internal Ftype: Must be 2 (Ushorten), 3 (Ushorten V1), or 5 (Ushorten 2) to be considered a valid Shorten stream.
- Channels: Must be between 1 and 8 (inclusive).
- Blocksize: Must be between 1 and 65535 (inclusive).
If these checks pass, FFmpeg determines the input is a Shorten file and proceeds to decode the raw audio stream that follows the header.
If I understand this format correctly, to determine the duration, we need to do a full parse of the entire file and calculate it from the number of samples (if that is even actually easily obtainable without decoding).