MediaInfoLib icon indicating copy to clipboard operation
MediaInfoLib copied to clipboard

add shn/shorten support -- duration

Open iconoclasthero opened this issue 2 weeks ago • 11 comments

Yes, I know it is 2025, but, you know, hippies*... There are still .shn files on Lossless Legs and Internet Archive's Live Music Archive; alot of stuff is still being [ethically] shared and doesn't have flac transcodes.

pwd: /library/torrent/downloads/2025/jgb1978-03-18L.sbd.serafin.7326.sbefail.shnf 
$ ll
.rw-rw-r--  841 15 Feb 11:58 Garcia1978-03-18.txt
.rw-rw-r--  280 15 Feb 11:58 jgb78-03-18-D1.md5
.rw-rw-r--  336 15 Feb 11:58 jgb78-03-18-D2.md5
.rw-rw-r--  841 15 Feb 11:58 jgb78-03-18_Time_Sheet.txt
.rw-rw-r-- 9.9k 15 Feb 11:58 jgb78-03-18_sssb.txt
.rw-rw-r--  25M 15 Feb 11:58 jgb78-03-18d2t03.shn
.rw-rw-r--  21M 15 Feb 11:58 jgb78-03-18d2t05.shn
.rw-rw-r--  60M 15 Feb 11:58 jgb78-03-18d1t02.shn
.rw-rw-r--  76M 15 Feb 11:58 jgb78-03-18d2t04.shn
.rw-rw-r--  78M 15 Feb 11:58 jgb78-03-18d1t04.shn
.rw-rw-r--  15M 15 Feb 11:58 jgb78-03-18d2t06.shn
.rw-rw-r--  37M 15 Feb 11:58 jgb78-03-18d1t05.shn
.rw-rw-r--  82M 15 Feb 11:58 jgb78-03-18d1t01.shn
.rw-rw-r--  51M 15 Feb 11:58 jgb78-03-18d2t01.shn
.rw-rw-r--  40M 15 Feb 11:58 jgb78-03-18d2t02.shn
.rw-rw-r--  56M 15 Feb 11:58 jgb78-03-18d1t03.shn

[Guessing this goes in the Lib rather than in the Bin repo.]

* Can't even get them to use 4-digit-year timestamps.

iconoclasthero avatar Dec 05 '25 18:12 iconoclasthero

sample file please, then we see when the core team has time and/or someone else is interested.

JeromeMartinez avatar Dec 05 '25 20:12 JeromeMartinez

FFmpeg parser for it: https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/shortendec.c

cjee21 avatar Dec 06 '25 06:12 cjee21

sample file please, then we see when the core team has time and/or someone else is interested.

https://archive.org/details/gd78-11-24.sbd.prefm.13948.sbefail.shnf

https://archive.org/download/gd78-11-24.sbd.prefm.13948.sbefail.shnf/GD-11-24-78-D1-T01.shn (Jack Straw: "We can share the women; we can share the wine...")

FFmpeg parser for it: https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/shortendec.c

huh. this would at least obviate the need for something in addition to mplayer/mediainfo/ffprobe.

iconoclasthero avatar Dec 06 '25 12:12 iconoclasthero

FFmpeg parser for it: https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/shortendec.c

FWIW, I'm primarily interested in duration and, after recompiling ffprobe/ffmpeg with

--enable-decoder=shorten \
--enable-demuxer=shorten

there's still no duration listed:

Input #0, shn, from '/Library/torrent/downloads/2025/jgb1978-03-18L.sbd.serafin.7326.sbefail.shnf/jgb78-03-18d1t01.shn':
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0: Audio: shorten, 44100 Hz, 2 channels, s16p

The only thing I've been able to find is shntool:

$ shntool len -q /library/torrent/downloads/2025/jgb1978-03-18L.sbd.serafin.7326.sbefail.shnf/jgb78-03-18d1t01.shn
    length     expanded size    cdr  WAVE problems  fmt   ratio  filename
    13:07.24      138882712 B   -b-   --   ---xx    shn  0.5899  /library/torrent/downloads/2025/jgb1978-03-18L.sbd.serafin.7326.sbefail.shnf/jgb78-03-18d1t01.shn
    13:07.24      138882712 B                            0.5899  (1 file)

iconoclasthero avatar Dec 06 '25 13:12 iconoclasthero

If there is a specification document for this format then it'll be easier to implement in MediaInfo.

cjee21 avatar Dec 06 '25 13:12 cjee21

If there is a specification document for this format then it'll be easier to implement in MediaInfo.

I don't know if there's a specification document that would still be available, but there is this: https://github.com/bayun/shntool/blob/master/src/core_mode.c

From what I've been able to hack around with this morning, the existing code may be portable from the shntool source.

iconoclasthero avatar Dec 06 '25 13:12 iconoclasthero

So I see that MediaInfo currently only has limited detection for shn here:

https://github.com/MediaArea/MediaInfoLib/blob/2a86cf5eba9f93f643b2f569d7b2d8bb2e26a161/Source/MediaInfo/File_Other.cpp#L168-L179

If you can do C++ programing then the fastest way to get MediaInfo to support shn is to do it yourself and submit a contribution via pull request (PR) or else it may take years if no one does it especially a rare format like this. If want to add full support with lots of code then it is better to have a dedicated parser for it. I can guide you if you need help.

Here is an example of WebP being moved from File_Other.cpp into it's own parser and full support added. https://github.com/MediaArea/MediaInfoLib/commit/3c78c35a10c238971b682d66a81200d5ba466c51

cjee21 avatar Dec 06 '25 16:12 cjee21

As much as I would like to do that, I really don't know anything other than bash. I was thinking that I might be able to extract enough of the code from shntool to get info & length but before seriously doing anything in that vein, I would need clarification as to license issues for using the GPL-2.0-licensed code in mediainfo.

iconoclasthero avatar Dec 07 '25 13:12 iconoclasthero

I would need clarification as to license issues for using the GPL-2.0-licensed code in mediainfo

Except for small external libraries (utilities, not a parser), we require copyright assignment so we don't accept external code, especially not if the license mandates the license of the final binary.

Here the external code would be the "spec", no more, and MediaInfo coding style (especially the MediaTrace feature) is required..

JeromeMartinez avatar Dec 07 '25 13:12 JeromeMartinez

I'm not a lawyer but I believe GPL code cannot be used in any project that is not GPL.

License issues aside, parsing should be using MediaInfo's style and using internal functions integrated with existing flow so I do not think one can simply copy paste some external code and expect it to work.

cjee21 avatar Dec 07 '25 13:12 cjee21

Here's the 'bitstream spec' for the header part written by Google Gemini from FFmpeg codes. I did not check the accuracy.

Shorten Bitstream Header Specification

🎧 Shorten Bitstream Header Specification

The demuxer checks for a specific signature and then reads parameters using a variation of Unsigned Rice (UR) codes defined for the Shorten format, implemented by the function get_ur_golomb_shorten.


File Signature

The first 4 bytes must match the magic number:

  • Bytes 0-3: 0x616a6b67 (ASCII for 'ajkg') in big-endian (AV_RB32). This acts as the file identifier.

💡 Understanding get_ur_golomb_shorten(k)

The get_ur_golomb_shorten(gb, k) function reads a single unsigned integer $N$ from the bitstream.

The value $N$ is decoded by:

  1. Counting the number of preceding zeroes ($q$) until a one ('1') is encountered (the unary part).
  2. Reading the next $k$ bits ($r$) (the remainder part).

The decoded value is $N = (q \cdot 2^k) + r$.


Version 0 Header

If the fifth byte is 0 (i.e., version == 0), the header follows this structure:

Field Decoding Method Resulting Value Description
Version 8 bits (Byte 4) 0 Must be 0.
Internal Ftype get_ur_golomb_shorten(k=4) $N$ Determines the internal file format/encoding type.
Channels get_ur_golomb_shorten(k=0) $N$ The number of audio channels.
Blocksize Constant 256 Fixed to 256.
  • Note: The header parsing for version 0 starts reading bits from the byte after the Version byte (p->buf + 5).

Version > 0 Header

If the fifth byte is greater than 0 (version > 0), the header follows a slightly more complex structure where the parameter's Golomb code $k$ is also encoded:

Field Decoding Method Resulting Value Description
Version 8 bits (Byte 4) $> 0$ Any value $> 0$.
Internal Ftype $k_{val}$ get_ur_golomb_shorten(k=2) $k_{ftype}$ The k value used for the next field. Must be $\le 31$.
Internal Ftype `get_ur_golomb_shorten(k=k_{ftype})$ $N$ Determines the internal file format/encoding type.
Channels $k_{val}$ get_ur_golomb_shorten(k=2) $k_{chan}$ The k value used for the next field.
Channels `get_ur_golomb_shorten(k=k_{chan})$ $N$ The number of audio channels.
Blocksize $k_{val}$ get_ur_golomb_shorten(k=2) $k_{block}$ The k value used for the next field.
Blocksize `get_ur_golomb_shorten(k=k_{block})$ $N$ The block size used for encoding.
  • Note: The header parsing for version $> 0$ also starts reading bits from the byte after the Version byte (p->buf + 5).

Validation Constraints

The demuxer performs validity checks on the parsed parameters:

  • Internal Ftype: Must be 2 (Ushorten), 3 (Ushorten V1), or 5 (Ushorten 2) to be considered a valid Shorten stream.
  • Channels: Must be between 1 and 8 (inclusive).
  • Blocksize: Must be between 1 and 65535 (inclusive).

If these checks pass, FFmpeg determines the input is a Shorten file and proceeds to decode the raw audio stream that follows the header.



If I understand this format correctly, to determine the duration, we need to do a full parse of the entire file and calculate it from the number of samples (if that is even actually easily obtainable without decoding).

cjee21 avatar Dec 07 '25 18:12 cjee21