vak icon indicating copy to clipboard operation
vak copied to clipboard

error msg: more than one duration for timebins across spectrogram files

Open avanikop opened this issue 3 years ago • 7 comments
trafficstars

Getting this error on the vak prep predict step:

(vak040b6) C:\Users\avanikoparkar\Documents\analysis\tolabel\bu38o58\toml_pred1>vak prep bu38o58_predict_020317.toml determined that purpose of config file is: predict will add 'csv_path' option to 'PREDICT' section purpose for dataset: predict will not split dataset making array files containing spectrograms from audio files in: C:\Users\avanikoparkar\Documents\analysis\tolabel\bu38o58\02Mar2017_WNrepeat\020317 creating array files with spectrograms [######### ] | 24% Completed | 17.2sD:\anacondnavi\envs\vak040b6\lib\site-packages\vak\spect.py:74: UserWarning: Only one segment is calculated since parameter NFFT (=512) >= signal length (=0). spect, freqbins, timebins = specgram(dat, fft_size, samp_freq, noverlap=noverlap)[ D:\anacondnavi\envs\vak040b6\lib\site-packages\vak\spect.py:80: RuntimeWarning: invalid value encountered in true_divide spect /= spect.max() # volume normalize to max 1 [############### ] | 38% Completed | 26.6sD:\anacondnavi\envs\vak040b6\lib\site-packages\vak\spect.py:74: UserWarning: Only one segment is calculated since parameter NFFT (=512) >= signal length (=0). spect, freqbins, timebins = specgram(dat, fft_size, samp_freq, noverlap=noverlap)[ D:\anacondnavi\envs\vak040b6\lib\site-packages\vak\spect.py:80: RuntimeWarning: invalid value encountered in true_divide spect /= spect.max() # volume normalize to max 1 [########################################] | 100% Completed | 1min 2.4s creating dataset from spectrogram files in: C:\Users\avanikoparkar\Documents\analysis\tolabel\bu38o58\02Mar2017_WNrepeat\020317\spectrograms_generated_220104_140750 validating set of spectrogram files [######### ] | 24% Completed | 7.5sD:\anacondnavi\envs\vak040b6\lib\site-packages\numpy\core\fromnumeric.py:3440: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, D:\anacondnavi\envs\vak040b6\lib\site-packages\numpy\core_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) [############### ] | 38% Completed | 8.8sD:\anacondnavi\envs\vak040b6\lib\site-packages\numpy\core\fromnumeric.py:3440: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, D:\anacondnavi\envs\vak040b6\lib\site-packages\numpy\core_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) [########################################] | 100% Completed | 11.6s Traceback (most recent call last): File "D:\anacondnavi\envs\vak040b6\lib\runpy.py", line 194, in _run_module_as_main return run_code(code, main_globals, None, File "D:\anacondnavi\envs\vak040b6\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "D:\anacondnavi\envs\vak040b6\Scripts\vak.exe_main.py", line 7, in File "D:\anacondnavi\envs\vak040b6\lib\site-packages\vak_main.py", line 45, in main cli.cli(command=args.command, config_file=args.configfile) File "D:\anacondnavi\envs\vak040b6\lib\site-packages\vak\cli\cli.py", line 30, in cli COMMAND_FUNCTION_MAPcommand File "D:\anacondnavi\envs\vak040b6\lib\site-packages\vak\cli\prep.py", line 132, in prep vak_df, csv_path = core.prep( File "D:\anacondnavi\envs\vak040b6\lib\site-packages\vak\core\prep.py", line 205, in prep vak_df = dataframe.from_files( File "D:\anacondnavi\envs\vak040b6\lib\site-packages\vak\io\dataframe.py", line 167, in from_files vak_df = spect.to_dataframe(**to_dataframe_kwargs, logger=logger) File "D:\anacondnavi\envs\vak040b6\lib\site-packages\vak\io\spect.py", line 199, in to_dataframe files.spect.is_valid_set_of_spect_files( File "D:\anacondnavi\envs\vak040b6\lib\site-packages\vak\files\spect.py", line 209, in is_valid_set_of_spect_files raise ValueError( ValueError: Found more than one duration for time bins across spectrogram files. Durations found were: [0.002 nan]'

Cannot figure out whether/which file is generating this error.

avanikop avatar Jan 04 '22 13:01 avanikop

Hmm. Yes, this would be a good place for the error message to tell you which files are generating the error.

Can you try the following and see what it gives you?

import pandas as pd

import vak.files.spect

vak_df = pd.read_csv('C:\\Users\\avanikop\\path\\to\\prep.csv')
spect_paths = vak_df.spect_path.values

n_decimals_trunc = 5
timebins_key = 't'
spect_format = 'npz'

for spect_path in spect_paths:
    timebin_durs = []
    
    spect_dict = vak.files.spect.load(spect_path, spect_format')
    time_bins = spect_dict[timebins_key]
    timebin_durs.append(
        vak.timebins.timebin_dur_from_vec(time_bins, n_decimals_trunc)
    )
    
vak_df['timebin_dur'] = timebin_durs
print(vak_df[vak_df[timebin_dur != 0.002]])

I think that should either print some rows in the DataFrame that will give you file names, or give you nothing, in which case we know there's something else going on.

Let me know if it's not clear what I'm trying to do or you get some weird error.

NickleDave avatar Jan 04 '22 13:01 NickleDave

Ignore my previous answer, I ran the code on the wrong csv file. For this folder, no prep.csv gets made, so I don't understand what to do.

avanikop avatar Jan 04 '22 13:01 avanikop

Ah right of course there's no csv because you're telling me that vak prep crashed. Got it.

And I left out some variable definitions from that snippet.
This is why I should have tested it.

Can you find the spectrograms_{timestamp} directory that gets created before the crash and check using this updated snippet?

from pathlib import Path

import vak.files.spect

# below, need to fix 'timestamp' to whatever the actual timestamp is, e.g. 'spectrograms_generated_220103_101738'
spect_paths = sorted(Path('C:\\Users\\avanikop\\path\\to\\spectrograms_generated_timestamp').glob('*.npz'))

n_decimals_trunc = 5
timebins_key = 't'
spect_format = 'npz'

for spect_path in spect_paths:
    timebin_durs = []

    spect_dict = vak.files.spect.load(spect_path, spect_format)
    time_bins = spect_dict[timebins_key]
    timebin_durs.append(
        vak.timebins.timebin_dur_from_vec(time_bins, n_decimals_trunc)
    )

for spect_path, timebin_dur in zip(spect_paths, timebin_durs):
    if timebin_dur != 0.002:
        print(spect_path, timebin_dur)

I did just test and that at least works for me locally (but it doesn't print anything at the end because all the timebin_durs are equal to 0.002)

And can you also try doing this?

import numpy as np

for spect_path, timebin_dur in zip(spect_paths, timebin_durs):
    if np.isnan(timebin_dur):
        print(spect_path, timebin_dur)

Just to verify there is some path where somehow we get a timebin_dur that's np.nan

NickleDave avatar Jan 04 '22 14:01 NickleDave

No output for both :/

avanikop avatar Jan 04 '22 15:01 avanikop

Ok, not sure what's going on.
Can we try tech support through Zoom in the next couple days? Please email me.
I think I need a little more context that's not easy to provide through GitHub Issues 😝

NickleDave avatar Jan 04 '22 16:01 NickleDave

Turns out, this was an error caused by a cbin file that was 0kb.

avanikop avatar Jan 05 '22 15:01 avanikop

Thank you @avanikop

Notes on this:

  • the error message should print the file names to be more informative (and anything else that could be useful). The paths are already there in the function, it's just not printing them
  • should add somewhere in docs advice about cleaning up data before vak prep, possibly with a script to remove files that are either size 0 or size "gigantic", where "gigantic" is an argument to the script

NickleDave avatar Jan 05 '22 15:01 NickleDave