python-soundfile icon indicating copy to clipboard operation
python-soundfile copied to clipboard

Soundfile cannot read the full FLAC file it generated

Open MaximilianHoffmann opened this issue 6 years ago • 19 comments

I am generating a soundfile by appending to audioFile = sf.SoundFile(conf['pSavAudio'], mode='w', samplerate=samplerate, channels=nChan, subtype='PCM_16', format='FLAC')

,but when I try to read all frames from it: arrA=audio.read(audio.frames, dtype='int16') I get an empty array, and the extra_info is changed to

"File : 'C:\Users\hoffmmax\Documents\Recordings\20190319_1807_TestCongruency13\20190319_1807_TestCongruency13_Audio.flac' (utf-8 converted from ucs-2)\nLength : 48373662\nFLAC Stream Metadata\n Channels : 5\n Sample rate : 192000\n Frames : 26000000\n Bit width : 16\nVorbis Comment Metadata\nEnd\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216 channels 5\nError: pflac->remain 16777216"

I understand that this is an error from the underlying library relating to the length not being divisble by the channels, but I only interact with the soundfile bei soundfile.Soundfile.write and MATLAB/Audacity don't complain

MaximilianHoffmann avatar Mar 19 '19 17:03 MaximilianHoffmann

Is this error raised by SoundFile, or a different library?

bastibe avatar Mar 20 '19 07:03 bastibe

it is not really an error, it is entered into the extra_info property of the Soundfile, I expect that its raised by the wrapped library, but not handled?

The python code does not throw any errors, but returns an array of zero length. Strangely, I found that on a different computer it works, I will look at the difference between the environments shortly...

MaximilianHoffmann avatar Mar 20 '19 12:03 MaximilianHoffmann

It works in this environment

attrs 18.2.0 py_0 conda-forge audioread 2.1.6 py37_0 conda-forge backcall 0.1.0 py_0 conda-forge blas 1.0 mkl bleach 3.1.0 py_0 conda-forge ca-certificates 2019.3.9 hecc5488_0 conda-forge certifi 2019.3.9 py37_0 conda-forge cffi 1.11.5 py37hfa6e2cd_1001 conda-forge colorama 0.4.1 py_0 conda-forge cycler 0.10.0 py_1 conda-forge decorator 4.3.0 py_0 conda-forge entrypoints 0.3 py37_1000 conda-forge freetype 2.9.1 h5db478b_1005 conda-forge h5py 2.9.0 nompi_py37h3cb27cb_1102 conda-forge hdf5 1.10.4 nompi_hcc15c50_1105 conda-forge icc_rt 2019.0.0 h0cc432a_1 icu 58.2 ha66f8fd_1 intel-openmp 2019.1 144 ipykernel 5.1.0 py37h39e3cac_1001 conda-forge ipython 7.2.0 py37h39e3cac_1000 conda-forge ipython_genutils 0.2.0 py_1 conda-forge jedi 0.13.2 py37_1000 conda-forge jinja2 2.10 py_1 conda-forge joblib 0.13.2 py_0 conda-forge jpeg 9c hfa6e2cd_1001 conda-forge jsonschema 3.0.0a3 py37_1000 conda-forge jupyter_client 5.2.4 py_0 conda-forge jupyter_core 4.4.0 py_0 conda-forge kiwisolver 1.0.1 py37he980bc4_1002 conda-forge libflang 5.0.0 h6538335_20180525 conda-forge libpng 1.6.36 h7602738_1000 conda-forge librosa 0.6.3 py_0 conda-forge libsodium 1.0.16 h2fa13f4_1001 conda-forge llvm-meta 5.0.0 0 conda-forge llvmlite 0.28.0 py37_0 conda-forge m2w64-gcc-libgfortran 5.3.0 6 m2w64-gcc-libs 5.3.0 7 m2w64-gcc-libs-core 5.3.0 7 m2w64-gmp 6.1.0 2 m2w64-libwinpthread-git 5.0.0.4634.697f757 2 markupsafe 1.1.0 py37hfa6e2cd_1000 conda-forge matplotlib 3.0.2 py37hc8f65d3_0 matplotlib-base 3.0.2 py37h3e3dc42_1001 conda-forge mistune 0.8.4 py37hfa6e2cd_1000 conda-forge mkl 2019.1 144 mkl_fft 1.0.10 py37hfa6e2cd_1 conda-forge mkl_random 1.0.2 py37h830ac7b_2 conda-forge msys2-conda-epoch 20160418 1 nb_conda 2.2.1 py37_0 nb_conda_kernels 2.2.0 py37_1000 conda-forge nbconvert 5.3.1 py_1 conda-forge nbformat 4.4.0 py_1 conda-forge notebook 5.7.4 py37_1000 conda-forge numba 0.43.0 py37hf9181ef_0 numpy 1.15.4 py37h19fb1c0_0 numpy-base 1.15.4 py37hc3f5095_0 openblas 0.3.3 h535eed3_1001 conda-forge openmp 5.0.0 vc14_1 conda-forge openssl 1.1.1b hfa6e2cd_1 conda-forge pandoc 2.5 1 conda-forge pandocfilters 1.4.2 py_1 conda-forge parso 0.3.1 py_0 conda-forge pickleshare 0.7.5 py37_1000 conda-forge pip 18.1 py37_1000 conda-forge portaudio 19.6.0 hca4a3dc_2 conda-forge prometheus_client 0.5.0 py_0 conda-forge prompt_toolkit 2.0.7 py_0 conda-forge pyaudio 0.2.11 py37hfa6e2cd_1 pycparser 2.19 py_0 conda-forge pygments 2.3.1 py_0 conda-forge pyparsing 2.3.1 py_0 conda-forge pyqt 5.9.2 py37h6538335_2 pyreadline 2.1 py37_1000 conda-forge pyrsistent 0.14.9 py37hfa6e2cd_1000 conda-forge PySoundFile 0.9.0.post1 python 3.7.1 hc182675_1000 conda-forge python-dateutil 2.7.5 py_0 conda-forge pytz 2018.9 py_0 conda-forge pywinpty 0.5.5 py37_1000 conda-forge pyzmq 17.1.2 py37hf576995_1001 conda-forge qt 5.9.7 vc14h73c81de_0 resampy 0.2.1 py_1 conda-forge scikit-learn 0.20.3 py37h343c172_0 scipy 1.2.0 py37h29ff71c_0 send2trash 1.5.0 py_0 conda-forge setuptools 40.6.3 py37_0 conda-forge sip 4.19.8 py37h6538335_1000 conda-forge six 1.12.0 py37_1000 conda-forge sqlite 3.26.0 hfa6e2cd_1000 conda-forge terminado 0.8.1 py37_1001 conda-forge testpath 0.4.2 py37_1000 conda-forge tornado 5.1.1 py37hfa6e2cd_1000 conda-forge traitlets 4.3.2 py37_1000 conda-forge vc 14.1 h0510ff6_4 vs2015_runtime 14.15.26706 h3a45250_0 wcwidth 0.1.7 py_1 conda-forge webencodings 0.5.1 py_1 conda-forge wheel 0.32.3 py37_0 conda-forge wincertstore 0.2 py37_1002 conda-forge winpty 0.4.3 4 conda-forge zeromq 4.2.5 he025d50_1006 conda-forge zlib 1.2.11 h2fa13f4_1004 conda-forge

Does not work here:

audioread 2.1.6 py36_0 conda-forge backcall 0.1.0 py36_0 blas 1.0 mkl bleach 3.1.0 py36_0 bokeh 1.0.4 py36_0 ca-certificates 2019.3.9 hecc5488_0 conda-forge certifi 2019.3.9 py36_0 conda-forge cffi 1.11.5 colorama 0.4.1 py36_0 cycler 0.10.0 py36h009560c_0 decorator 4.3.0 py36_0 entrypoints 0.3 py36_0 freetype 2.9.1 ha9979f8_1 hdf5 1.8.20 hac2f561_1 icc_rt 2019.0.0 h0cc432a_1 icu 58.2 ha66f8fd_1 intel-openmp 2019.1 144 ipykernel 5.1.0 py36h39e3cac_0 ipython 7.2.0 py36h39e3cac_0 ipython_genutils 0.2.0 py36h3c5d0ee_0 jedi 0.13.2 py36_0 jinja2 2.10 py36_0 joblib 0.13.2 py_0 conda-forge jpeg 9c hfa6e2cd_1001 conda-forge jsonschema 2.6.0 py36h7636477_0 jupyter_client 5.2.4 py36_0 jupyter_core 4.4.0 py36_0 kiwisolver 1.0.1 py36h6538335_0 libopencv 3.4.2 h20b85fd_0 libpng 1.6.36 h2a8f88b_0 librosa 0.6.3 py_0 conda-forge libsodium 1.0.16 h9d3ae62_0 libtiff 4.0.10 h2929a5b_1001 llvmlite 0.28.0 py36_0 conda-forge m2w64-gcc-libgfortran 5.3.0 6 m2w64-gcc-libs 5.3.0 7 m2w64-gcc-libs-core 5.3.0 7 m2w64-gmp 6.1.0 2 m2w64-libwinpthread-git 5.0.0.4634.697f757 2 markupsafe 1.1.0 py36he774522_0 matplotlib 3.0.2 py36hc8f65d3_0 mistune 0.8.4 py36he774522_0 mkl 2018.0.2 1 mkl_fft 1.0.1 py36h452e1ab_0 mkl_random 1.0.1 py36h9258bd6_0 msys2-conda-epoch 20160418 1 nb_conda 2.2.1 py36_0 nb_conda_kernels 2.2.0 py36_0 nbconvert 5.3.1 py36_0 nbformat 4.4.0 py36h3a5bc1b_0 nidaqmx 0.5.7 notebook 5.7.4 py36_0 numba 0.43.0 py36hf9181ef_0 numpy 1.14.3 py36h9fa60d3_1 numpy-base 1.14.3 py36h555522e_1 olefile 0.46 py36_0 opencv-python 4.0.0.21 openssl 1.1.1b hfa6e2cd_1 conda-forge packaging 19.0 py36_0 pandoc 2.2.3.2 0 pandocfilters 1.4.2 py36_1 parso 0.3.1 py36_0 pickleshare 0.7.5 py36_0 Pillow 5.4.1 pip 18.1 py36_0 portaudio 19.6.0 hfa6e2cd_3 anaconda prometheus_client 0.5.0 py36_0 prompt_toolkit 2.0.7 py36_0 py-opencv 3.4.2 py36hc319ecb_0 pyaudio 0.2.11 py36hfa6e2cd_1 anaconda pycparser 2.19 pygments 2.3.1 py36_0 pyparsing 2.3.1 py36_0 pypylon 1.4.0 pyqt 5.9.2 py36h6538335_2 python 3.6.8 h9f7ef89_0 python-dateutil 2.7.5 py36_0 pytz 2018.9 py36_0 pywinpty 0.5.5 py36_1000 pyyaml 3.13 py36hfa6e2cd_0 pyzmq 17.1.2 py36hfa6e2cd_0 qt 5.9.7 vc14h73c81de_0 resampy 0.2.1 py_1 conda-forge scikit-learn 0.19.1 py36h53aea1b_0 scipy 1.1.0 py36h672f292_0 send2trash 1.5.0 py36_0 setuptools 40.6.3 py36_0 sip 4.19.8 py36h6538335_0 six 1.12.0 py36_0 six 1.12.0 SoundFile 0.10.2 spinnaker-python 1.20.0.15 sqlite 3.26.0 he774522_0 terminado 0.8.1 py36_1 testpath 0.4.2 py36_0 tk 8.6.9 hfa6e2cd_1000 conda-forge tornado 5.1.1 py36hfa6e2cd_0 traitlets 4.3.2 py36h096827d_0 vc 14.1 h21ff451_3 anaconda vs2015_runtime 15.5.2 3 anaconda wcwidth 0.1.7 py36h3d5aa90_0 webencodings 0.5.1 py36_1 wheel 0.32.3 py36_0 wincertstore 0.2 py36h7fe50ca_0 winpty 0.4.3 4 yaml 0.1.7 hc54c509_2 zeromq 4.2.5 he025d50_1 zlib 1.2.11 h62dcd97_3

PySoundfile vs Soundfile?

MaximilianHoffmann avatar Mar 20 '19 12:03 MaximilianHoffmann

PySoundfile vs Soundfile?

SoundFile is the current, correct name. PySoundFile is deprecated, and now outdated.

bastibe avatar Mar 20 '19 13:03 bastibe

I understand, but the problem occurs in the SoundFile environment,

MaximilianHoffmann avatar Mar 20 '19 13:03 MaximilianHoffmann

Can you show me a full error message and stack trace? I still don't understand whether it is our code that is throwing the error, or a different library.

bastibe avatar Mar 20 '19 15:03 bastibe

Thanks for looking into this. Unfortunately there is no stack trace, since there is no exception thrown. The extra string written into the extra_info is the only sign. Otherwise the read comments completes but returns an emtpy (shape 0 x channels) array

MaximilianHoffmann avatar Mar 21 '19 08:03 MaximilianHoffmann

https://owncloud-ext.charite.de/owncloud/index.php/s/huO8IPctCz25bOW

Contains an audiofile, for which this happen and a screenshot of the error.

MaximilianHoffmann avatar Mar 21 '19 08:03 MaximilianHoffmann

Can you read the whole file if you read('filename.flac')?

bastibe avatar Mar 21 '19 08:03 bastibe

No, exactly the same behaviour.

MaximilianHoffmann avatar Mar 21 '19 09:03 MaximilianHoffmann

This is really strange. libsndfile seems to have problems reading more than a certain amount (between 3M and 4M) of frames from the file:

>>> import soundfile as sf
>>> f = sf.SoundFile('test.flac')
>>> len(f)
6800000
>>> a = f.read(4_000_000)
>>> a.shape
(0, 5)
>>> b = f.read(3_000_000)
>>> b.shape
(3000000, 5)

mgeier avatar Mar 21 '19 09:03 mgeier

...Yes, I observed this, too, put be assured, that I created this file by writing routines of SoundFile only

MaximilianHoffmann avatar Mar 21 '19 13:03 MaximilianHoffmann

Is there a difference in the underlying libsndfile versions of the two environments?

MaximilianHoffmann avatar Mar 22 '19 08:03 MaximilianHoffmann

I exchanged the dll's, this solves the problem.

MaximilianHoffmann avatar Mar 26 '19 13:03 MaximilianHoffmann

Good to know! What version of DLLs did you use?

bastibe avatar Mar 27 '19 08:03 bastibe

I came across something like this recently, is it the same issue as https://github.com/erikd/libsndfile/issues/431 ?

3ll3d00d avatar Sep 06 '19 07:09 3ll3d00d

I ran into this issue again, even with different libraries. Strangely it depends on the blocksize, when iterating through the respective soundfile, I didn't quite get to the bottom of the FLAC issue above, but it could be related, however my recording has 4 channels

MaximilianHoffmann avatar Oct 30 '20 21:10 MaximilianHoffmann

@MaximilianHoffmann Strangely, I ran into the same problem as you when I tried to read a 7-channel .flac file, but I managed to code a crude function that works around this issue (based on the advice given in libsndfile/libsndfile#431) by reading the file chunk by chunk, so if it helps, I've placed it in a repo at https://github.com/kenowr/read_flac and replicated the function below:

import numpy as np
import soundfile as sf
def read_flac(file, chunk = None, **kwargs):
    ## SET CHUNK SIZE
    x, sr = sf.read(file, start = 0, stop = 0, **kwargs)
    n_channels = x.shape[1] if len(x.shape) == 2 else 1
    if n_channels in [1,2,4,8]:
        return sf.read(file, **kwargs)
    elif chunk is None:
        chunk = (2**24)//n_channels

    ## READ CHUNK BY CHUNK
    parts = []
    n_frames = 0
    i = 0
    end_reached = False
    while not end_reached:
        x, sr = sf.read(file, start = i*chunk, stop = (i+1)*chunk, **kwargs)
        if x.shape[0] != 0:
            parts.append(x)
        if x.shape[0] < chunk: 
            end_reached = True
        n_frames += x.shape[0]
        i += 1

    ## GENERATE CORRECT OUTPUT ARRAY
    x = np.zeros((n_frames, n_channels))
    start = 0
    stop = len(parts[0])
    for part in parts:
        x[start:stop,:] = part
        start = stop
        stop = min(stop + chunk,n_frames)

    return x, sr

kenowr avatar Nov 27 '20 08:11 kenowr

A real bad workaround I found is to re-encode the file with sox.....

MaximilianHoffmann avatar Nov 27 '20 14:11 MaximilianHoffmann