mimetype icon indicating copy to clipboard operation
mimetype copied to clipboard

Audio stream recording file extension is not detected

Open F0rzend opened this issue 2 years ago • 6 comments

I have next bytes:

header := []byte{112, 44, 209, 245, 107, 234, 157, 24, 68, 65, 203, 234, 91, 217, 92, 73, 96, 16, 165, 1, 242, 4, 238, 116, 56, 128, 52, 53, 117, 94, 102, 36, 253, 83, 85, 108, 139, 149, 47, 151, 242, 21, 53, 27, 182, 95, 85, 145, 197, 130, 94, 11, 37, 107, 43, 248, 53, 209, 117, 91, 174, 48, 44, 49, 35, 126, 230, 33, 171, 150, 173, 81, 214, 149, 99, 220, 174, 89, 239, 211, 127, 243, 91, 252, 165, 55, 41, 36, 88, 82, 99, 123, 41, 172, 169, 106, 210, 218, 167, 238, 234, 95, 175, 91, 120, 229, 188, 113, 199, 120, 210, 217, 164, 169, 115, 243, 207, 154, 255, 254, 111, 255, 247, 87, 255, 255, 255, 227, 7, 0, 24, 118, 82, 29, 123, 108, 221, 142, 183, 44, 159, 75, 88, 210, 185, 137, 3, 24, 180, 233, 151, 146, 152, 130, 193, 143, 132, 32, 4, 160, 228, 48, 236, 120, 48, 207, 65, 21, 145, 25, 144, 252, 2, 22, 168, 204, 4, 1, 166, 182, 54, 148, 185, 223, 116, 209, 12, 235, 119, 95, 140, 61, 92, 42, 55, 156, 64, 183, 165, 56, 159, 73, 167, 190, 14, 95, 206, 66, 211, 90, 237, 134, 71, 46, 92, 174, 10, 87, 202, 53, 105, 74, 212, 13, 131, 206, 207, 38, 139, 42, 198, 146, 142, 159, 249, 255, 251, 144, 196, 205, 128, 38, 37, 169, 73, 249, 205, 0, 67, 51, 44, 234, 119, 55, 128, 0, 190, 78, 119, 255, 185, 198}

It is the record of stream of radio-t.com. I expected to get an mp3 file extension, but as a mimetype I got application/octet-stream.

Version of the library you are using v1.4.1

Output of go version go version go1.18.1 linux/amd64

Additional context I wrote a test for this.

package test

import (
	"github.com/gabriel-vasile/mimetype"
	"github.com/stretchr/testify/assert"
)

func TestFileDetection(t *testing.T) {
	t.Parallel()

	// Radio-T stream header
	header := []byte{
		112, 44, 209, 245, 107, 234, 157, 24, 68, 65, 203, 234, 91, 217, 92, 73, 96, 16, 165, 1, 242, 4, 238, 116, 56,
		128, 52, 53, 117, 94, 102, 36, 253, 83, 85, 108, 139, 149, 47, 151, 242, 21, 53, 27, 182, 95, 85, 145, 197, 130,
		94, 11, 37, 107, 43, 248, 53, 209, 117, 91, 174, 48, 44, 49, 35, 126, 230, 33, 171, 150, 173, 81, 214, 149, 99,
		220, 174, 89, 239, 211, 127, 243, 91, 252, 165, 55, 41, 36, 88, 82, 99, 123, 41, 172, 169, 106, 210, 218, 167,
		238, 234, 95, 175, 91, 120, 229, 188, 113, 199, 120, 210, 217, 164, 169, 115, 243, 207, 154, 255, 254, 111, 255,
		247, 87, 255, 255, 255, 227, 7, 0, 24, 118, 82, 29, 123, 108, 221, 142, 183, 44, 159, 75, 88, 210, 185, 137, 3,
		24, 180, 233, 151, 146, 152, 130, 193, 143, 132, 32, 4, 160, 228, 48, 236, 120, 48, 207, 65, 21, 145, 25, 144,
		252, 2, 22, 168, 204, 4, 1, 166, 182, 54, 148, 185, 223, 116, 209, 12, 235, 119, 95, 140, 61, 92, 42, 55, 156,
		64, 183, 165, 56, 159, 73, 167, 190, 14, 95, 206, 66, 211, 90, 237, 134, 71, 46, 92, 174, 10, 87, 202, 53, 105,
		74, 212, 13, 131, 206, 207, 38, 139, 42, 198, 146, 142, 159, 249, 255, 251, 144, 196, 205, 128, 38, 37, 169, 73,
		249, 205, 0, 67, 51, 44, 234, 119, 55, 128, 0, 190, 78, 119, 255, 185, 198,
	}

	mime := mimetype.Detect(header)
	fileExtension := mime.Extension()

	t.Log(mime)

	if fileExtension == "" {
		t.Errorf("File extension not detected")
	}
}

I also have a recording of another segment of this stream. The type of this file is also not defined: https://drive.google.com/file/d/1sL18cF-zwN7txDfm30QnZoLlozcG4-5g/view?usp=sharing

F0rzend avatar Jul 16 '22 20:07 F0rzend

I'm looking into this. Linux file utility (which is, I'd say, best file format detection tool) also fails to detect the samples.

What program/library was used to create these samples?

gabriel-vasile avatar Jul 21 '22 11:07 gabriel-vasile

Mp3 files are made up of frames. The problem seems to be that the test recordings start with an incomplete frame (maybe because they have been streamed?) The first complete mp3 frame starts at index 126 in the test case you provided.

package test

import (
	"github.com/gabriel-vasile/mimetype"
	"github.com/stretchr/testify/assert"
)

func TestFileDetection(t *testing.T) {
	t.Parallel()

	// Radio-T stream header
	header := []byte{
		112, 44, 209, 245, 107, 234, 157, 24, 68, 65, 203, 234, 91, 217, 92, 73, 96, 16, 165, 1, 242, 4, 238, 116, 56,
		128, 52, 53, 117, 94, 102, 36, 253, 83, 85, 108, 139, 149, 47, 151, 242, 21, 53, 27, 182, 95, 85, 145, 197, 130,
		94, 11, 37, 107, 43, 248, 53, 209, 117, 91, 174, 48, 44, 49, 35, 126, 230, 33, 171, 150, 173, 81, 214, 149, 99,
		220, 174, 89, 239, 211, 127, 243, 91, 252, 165, 55, 41, 36, 88, 82, 99, 123, 41, 172, 169, 106, 210, 218, 167,
		238, 234, 95, 175, 91, 120, 229, 188, 113, 199, 120, 210, 217, 164, 169, 115, 243, 207, 154, 255, 254, 111, 255,
		247, 87, 255, 255, 255, 227, 7, 0, 24, 118, 82, 29, 123, 108, 221, 142, 183, 44, 159, 75, 88, 210, 185, 137, 3,
		24, 180, 233, 151, 146, 152, 130, 193, 143, 132, 32, 4, 160, 228, 48, 236, 120, 48, 207, 65, 21, 145, 25, 144,
		252, 2, 22, 168, 204, 4, 1, 166, 182, 54, 148, 185, 223, 116, 209, 12, 235, 119, 95, 140, 61, 92, 42, 55, 156,
		64, 183, 165, 56, 159, 73, 167, 190, 14, 95, 206, 66, 211, 90, 237, 134, 71, 46, 92, 174, 10, 87, 202, 53, 105,
		74, 212, 13, 131, 206, 207, 38, 139, 42, 198, 146, 142, 159, 249, 255, 251, 144, 196, 205, 128, 38, 37, 169, 73,
		249, 205, 0, 67, 51, 44, 234, 119, 55, 128, 0, 190, 78, 119, 255, 185, 198,
	}

	mime := mimetype.Detect(header[126:])
	fileExtension := mime.Extension()

	t.Log(mime)

	if fileExtension == "" {
		t.Errorf("File extension not detected")
	}
}

That being said, I'm not sure if mimetype should search for the first frame in input. Looking what other projects are doing, file/file and apache/tika don't search for header either.

On the other hand, the mp3 specification says that decoders should search for beginning of frame if they don't find it at index 0 in input (that's why the recording plays fine, even if it is truncated).

gabriel-vasile avatar Jul 25 '22 13:07 gabriel-vasile

What program/library was used to create these samples?

It is the record of radio-t stream, created using io.Copy https://github.com/F0rzend/radiot_dumper/blob/master/copier/stream_copier.go#L82

F0rzend avatar Jul 27 '22 17:07 F0rzend

This file: https://drive.google.com/file/d/1sL18cF-zwN7txDfm30QnZoLlozcG4-5g/view?usp=sharing is the original mp3 from radio-t.com or was it saved through radiot_dumper? I think there are some problems with the way StreamCopier saves files.

gabriel-vasile avatar Jul 28 '22 22:07 gabriel-vasile

Unfortunately, the problem has not been resolved. Apparently the point is that this is a stream, and not just a recording

I'm not sure about that. I saved some mp3 segments from these radio stations and they are all detected correctly. ex: https://stream.rcast.net/200399.mp3 https://stream.rcast.net/200292.mp3 https://stream.rcast.net/200167.mp3

Can you provide the URL to the radio stream that reproduces the issue?

gabriel-vasile avatar Jul 31 '22 08:07 gabriel-vasile

Unfortunately, the problem has not been resolved. Apparently the point is that this is a stream, and not just a recording

I'm not sure about that. I saved some mp3 segments from these radio stations and they are all detected correctly. ex: https://stream.rcast.net/200399.mp3 https://stream.rcast.net/200292.mp3 https://stream.rcast.net/200167.mp3

Can you provide the URL to the radio stream that reproduces the issue?

I write records from https://stream.radio-t.com/. But the stream starts once a week. Saturday at 20:00 UTC

F0rzend avatar Jul 31 '22 17:07 F0rzend