mimetype
mimetype copied to clipboard
Audio stream recording file extension is not detected
I have next bytes:
header := []byte{112, 44, 209, 245, 107, 234, 157, 24, 68, 65, 203, 234, 91, 217, 92, 73, 96, 16, 165, 1, 242, 4, 238, 116, 56, 128, 52, 53, 117, 94, 102, 36, 253, 83, 85, 108, 139, 149, 47, 151, 242, 21, 53, 27, 182, 95, 85, 145, 197, 130, 94, 11, 37, 107, 43, 248, 53, 209, 117, 91, 174, 48, 44, 49, 35, 126, 230, 33, 171, 150, 173, 81, 214, 149, 99, 220, 174, 89, 239, 211, 127, 243, 91, 252, 165, 55, 41, 36, 88, 82, 99, 123, 41, 172, 169, 106, 210, 218, 167, 238, 234, 95, 175, 91, 120, 229, 188, 113, 199, 120, 210, 217, 164, 169, 115, 243, 207, 154, 255, 254, 111, 255, 247, 87, 255, 255, 255, 227, 7, 0, 24, 118, 82, 29, 123, 108, 221, 142, 183, 44, 159, 75, 88, 210, 185, 137, 3, 24, 180, 233, 151, 146, 152, 130, 193, 143, 132, 32, 4, 160, 228, 48, 236, 120, 48, 207, 65, 21, 145, 25, 144, 252, 2, 22, 168, 204, 4, 1, 166, 182, 54, 148, 185, 223, 116, 209, 12, 235, 119, 95, 140, 61, 92, 42, 55, 156, 64, 183, 165, 56, 159, 73, 167, 190, 14, 95, 206, 66, 211, 90, 237, 134, 71, 46, 92, 174, 10, 87, 202, 53, 105, 74, 212, 13, 131, 206, 207, 38, 139, 42, 198, 146, 142, 159, 249, 255, 251, 144, 196, 205, 128, 38, 37, 169, 73, 249, 205, 0, 67, 51, 44, 234, 119, 55, 128, 0, 190, 78, 119, 255, 185, 198}
It is the record of stream of radio-t.com.
I expected to get an mp3
file extension, but as a mimetype I got application/octet-stream
.
Version of the library you are using v1.4.1
Output of go version
go version go1.18.1 linux/amd64
Additional context I wrote a test for this.
package test
import (
"github.com/gabriel-vasile/mimetype"
"github.com/stretchr/testify/assert"
)
func TestFileDetection(t *testing.T) {
t.Parallel()
// Radio-T stream header
header := []byte{
112, 44, 209, 245, 107, 234, 157, 24, 68, 65, 203, 234, 91, 217, 92, 73, 96, 16, 165, 1, 242, 4, 238, 116, 56,
128, 52, 53, 117, 94, 102, 36, 253, 83, 85, 108, 139, 149, 47, 151, 242, 21, 53, 27, 182, 95, 85, 145, 197, 130,
94, 11, 37, 107, 43, 248, 53, 209, 117, 91, 174, 48, 44, 49, 35, 126, 230, 33, 171, 150, 173, 81, 214, 149, 99,
220, 174, 89, 239, 211, 127, 243, 91, 252, 165, 55, 41, 36, 88, 82, 99, 123, 41, 172, 169, 106, 210, 218, 167,
238, 234, 95, 175, 91, 120, 229, 188, 113, 199, 120, 210, 217, 164, 169, 115, 243, 207, 154, 255, 254, 111, 255,
247, 87, 255, 255, 255, 227, 7, 0, 24, 118, 82, 29, 123, 108, 221, 142, 183, 44, 159, 75, 88, 210, 185, 137, 3,
24, 180, 233, 151, 146, 152, 130, 193, 143, 132, 32, 4, 160, 228, 48, 236, 120, 48, 207, 65, 21, 145, 25, 144,
252, 2, 22, 168, 204, 4, 1, 166, 182, 54, 148, 185, 223, 116, 209, 12, 235, 119, 95, 140, 61, 92, 42, 55, 156,
64, 183, 165, 56, 159, 73, 167, 190, 14, 95, 206, 66, 211, 90, 237, 134, 71, 46, 92, 174, 10, 87, 202, 53, 105,
74, 212, 13, 131, 206, 207, 38, 139, 42, 198, 146, 142, 159, 249, 255, 251, 144, 196, 205, 128, 38, 37, 169, 73,
249, 205, 0, 67, 51, 44, 234, 119, 55, 128, 0, 190, 78, 119, 255, 185, 198,
}
mime := mimetype.Detect(header)
fileExtension := mime.Extension()
t.Log(mime)
if fileExtension == "" {
t.Errorf("File extension not detected")
}
}
I also have a recording of another segment of this stream. The type of this file is also not defined: https://drive.google.com/file/d/1sL18cF-zwN7txDfm30QnZoLlozcG4-5g/view?usp=sharing
I'm looking into this.
Linux file
utility (which is, I'd say, best file format detection tool) also fails to detect the samples.
What program/library was used to create these samples?
Mp3 files are made up of frames. The problem seems to be that the test recordings start with an incomplete frame (maybe because they have been streamed?) The first complete mp3 frame starts at index 126 in the test case you provided.
package test
import (
"github.com/gabriel-vasile/mimetype"
"github.com/stretchr/testify/assert"
)
func TestFileDetection(t *testing.T) {
t.Parallel()
// Radio-T stream header
header := []byte{
112, 44, 209, 245, 107, 234, 157, 24, 68, 65, 203, 234, 91, 217, 92, 73, 96, 16, 165, 1, 242, 4, 238, 116, 56,
128, 52, 53, 117, 94, 102, 36, 253, 83, 85, 108, 139, 149, 47, 151, 242, 21, 53, 27, 182, 95, 85, 145, 197, 130,
94, 11, 37, 107, 43, 248, 53, 209, 117, 91, 174, 48, 44, 49, 35, 126, 230, 33, 171, 150, 173, 81, 214, 149, 99,
220, 174, 89, 239, 211, 127, 243, 91, 252, 165, 55, 41, 36, 88, 82, 99, 123, 41, 172, 169, 106, 210, 218, 167,
238, 234, 95, 175, 91, 120, 229, 188, 113, 199, 120, 210, 217, 164, 169, 115, 243, 207, 154, 255, 254, 111, 255,
247, 87, 255, 255, 255, 227, 7, 0, 24, 118, 82, 29, 123, 108, 221, 142, 183, 44, 159, 75, 88, 210, 185, 137, 3,
24, 180, 233, 151, 146, 152, 130, 193, 143, 132, 32, 4, 160, 228, 48, 236, 120, 48, 207, 65, 21, 145, 25, 144,
252, 2, 22, 168, 204, 4, 1, 166, 182, 54, 148, 185, 223, 116, 209, 12, 235, 119, 95, 140, 61, 92, 42, 55, 156,
64, 183, 165, 56, 159, 73, 167, 190, 14, 95, 206, 66, 211, 90, 237, 134, 71, 46, 92, 174, 10, 87, 202, 53, 105,
74, 212, 13, 131, 206, 207, 38, 139, 42, 198, 146, 142, 159, 249, 255, 251, 144, 196, 205, 128, 38, 37, 169, 73,
249, 205, 0, 67, 51, 44, 234, 119, 55, 128, 0, 190, 78, 119, 255, 185, 198,
}
mime := mimetype.Detect(header[126:])
fileExtension := mime.Extension()
t.Log(mime)
if fileExtension == "" {
t.Errorf("File extension not detected")
}
}
That being said, I'm not sure if mimetype
should search for the first frame in input.
Looking what other projects are doing, file/file and apache/tika don't search for header either.
On the other hand, the mp3 specification says that decoders should search for beginning of frame if they don't find it at index 0 in input (that's why the recording plays fine, even if it is truncated).
What program/library was used to create these samples?
It is the record of radio-t stream, created using io.Copy https://github.com/F0rzend/radiot_dumper/blob/master/copier/stream_copier.go#L82
This file: https://drive.google.com/file/d/1sL18cF-zwN7txDfm30QnZoLlozcG4-5g/view?usp=sharing is the original mp3 from radio-t.com or was it saved through radiot_dumper
? I think there are some problems with the way StreamCopier
saves files.
Unfortunately, the problem has not been resolved. Apparently the point is that this is a stream, and not just a recording
I'm not sure about that. I saved some mp3 segments from these radio stations and they are all detected correctly. ex: https://stream.rcast.net/200399.mp3 https://stream.rcast.net/200292.mp3 https://stream.rcast.net/200167.mp3
Can you provide the URL to the radio stream that reproduces the issue?
Unfortunately, the problem has not been resolved. Apparently the point is that this is a stream, and not just a recording
I'm not sure about that. I saved some mp3 segments from these radio stations and they are all detected correctly. ex: https://stream.rcast.net/200399.mp3 https://stream.rcast.net/200292.mp3 https://stream.rcast.net/200167.mp3
Can you provide the URL to the radio stream that reproduces the issue?
I write records from https://stream.radio-t.com/. But the stream starts once a week. Saturday at 20:00 UTC