pytorch-kaldi icon indicating copy to clipboard operation
pytorch-kaldi copied to clipboard

shared_list does not have data_set in forward block with TIMIT tutorial

Open hajime9652 opened this issue 5 years ago • 27 comments

------------------------------ Epoch 23 / 23 ------------------------------
 
----- Summary epoch 23 / 23
Training on ['TIMIT_tr']
Loss = 0.932 | err = 0.298 
-----
Validating on TIMIT_dev
Loss = 1.811 | err = 0.468 
-----
Learning rate on architecture1 = 0.08 
-----
Elapsed time (s) = 574

 
Testing TIMIT_test chunk = 1 / 1
shared list []
shared list [None, None, None, {'mfcc': ['mfcc', 'exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst', 'apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5_0827_test/data/test/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5_0827_test/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |', '5', '5']}, {}, {'MLP_layers1': ['architecture1', 'MLP_layers1', 0]}, {'input': None, 'ref': None}]
output folder exp/TIMIT_MLP_basic
data_set_dict <class 'dict'>
data_set_dict {'input': None, 'ref': None}
Traceback (most recent call last):
  File "run_exp.py", line 340, in <module>
    data_set_inp, data_set_ref = convert_numpy_to_torch(data_set_dict, save_gpumem, use_cuda)
  File "/home/sysadmin/pytorch-kaldi/core.py", line 46, in convert_numpy_to_torch
    data_set_inp=torch.from_numpy(data_set_dict['input']).float()
TypeError: expected np.ndarray (got NoneType)

hajime9652 avatar Aug 29 '19 01:08 hajime9652

# --------FORWARD--------#
for forward_data in forward_data_lst:

         # Compute the number of chunks
         N_ck_forward=compute_n_chunks(out_folder,forward_data,ep,N_ep_str_format,'forward')
         N_ck_str_format='0'+str(max(math.ceil(np.log10(N_ck_forward)),1))+'d'

         processes = list()
         info_files = list()
         for ck in range(N_ck_forward):

            if not is_production:
                print('Testing %s chunk = %i / %i' %(forward_data,ck+1, N_ck_forward))
            else:
                print('Forwarding %s chunk = %i / %i' %(forward_data,ck+1, N_ck_forward))

            # output file
            info_file=out_folder+'/exp_files/forward_'+forward_data+'_ep'+format(ep, N_ep_str_format)+'_ck'+format(ck, N_ck_str_format)+'.info'
            config_chunk_file=out_folder+'/exp_files/forward_'+forward_data+'_ep'+format(ep, N_ep_str_format)+'_ck'+format(ck, N_ck_str_format)+'.cfg'


            # Do forward if the chunk was not already processed
            if not(os.path.exists(info_file)):

                # Doing forward

                # getting the next chunk 
                next_config_file=cfg_file_list[op_counter]

                # run chunk processing                    
                if _run_forwarding_in_subprocesses(config):
                    shared_list = list()
                    print("shared list",shared_list)
                    output_folder = config['exp']['out_folder']
                    save_gpumem = strtobool(config['exp']['save_gpumem'])
                    use_cuda=strtobool(config['exp']['use_cuda'])
                    p = read_next_chunk_into_shared_list_with_subprocess(read_lab_fea, shared_list, config_chunk_file, is_production, output_folder, wait_for_process=True)
                    data_name, data_end_index_fea, data_end_index_lab, fea_dict, lab_dict, arch_dict, data_set_dict = extract_data_from_shared_list(shared_list)
                    print("shared list", shared_list)
                    print("output folder",output_folder)
                    print("data_set_dict",type(data_set_dict))
                    print("data_set_dict",data_set_dict)
                    data_set_inp, data_set_ref = convert_numpy_to_torch(data_set_dict, save_gpumem, use_cuda)

hajime9652 avatar Aug 29 '19 01:08 hajime9652

When is shared_list overwrite? and How to bring the correct data_set?

hajime9652 avatar Aug 29 '19 01:08 hajime9652

Hi ! Isn't it simply a problem with the path of the test dataset in the config file ?

TParcollet avatar Aug 29 '19 08:08 TParcollet

Yes, it looks like that!

On Thu, 29 Aug 2019 at 04:48, Parcollet Titouan [email protected] wrote:

Hi ! Isn't it simply a problem with the path of the test dataset in the config file ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/157?email_source=notifications&email_token=AEA2ZVUTZHOAI7NUZZN4R4DQG6EM3A5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5NX6NI#issuecomment-526090037, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVUQKVVQBV35NWHQSSLQG6EM3ANCNFSM4IRZSXTQ .

mravanelli avatar Aug 29 '19 15:08 mravanelli

I will check again.

hajime9652 avatar Aug 30 '19 06:08 hajime9652

I'm still in trouble.

ERROR MSG

------------------------------ Epoch 23 / 23 ------------------------------
 
----- Summary epoch 23 / 23
Training on ['TIMIT_tr']
Loss = 0.932 | err = 0.298 
-----
Validating on TIMIT_dev
Loss = 1.812 | err = 0.468 
-----
Learning rate on architecture1 = 0.08 
-----
Elapsed time (s) = 489

 
Testing TIMIT_test chunk = 1 / 1
config chunk file exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0.cfg
shared list [None, None, None, {'mfcc': ['mfcc', 'exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst', 'apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/test/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |', '5', '5']}, {}, {'MLP_layers1': ['architecture1', 'MLP_layers1', 0]}, {'input': None, 'ref': None}]
Traceback (most recent call last):
  File "run_exp.py", line 338, in <module>
    data_set_inp, data_set_ref = convert_numpy_to_torch(data_set_dict, save_gpumem, use_cuda)
  File "/home/sysadmin/pytorch-kaldi/core.py", line 46, in convert_numpy_to_torch
    data_set_inp=torch.from_numpy(data_set_dict['input']).float()
TypeError: expected np.ndarray (got NoneType)

cfg

[dataset1]
data_name = TIMIT_tr
fea = fea_name=mfcc
        fea_lst=/home/sysadmin/kaldi/egs/timit/s5/data/train/feats.scp
        fea_opts=apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/train/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
        cw_left=5
        cw_right=5
        

lab = lab_name=lab_cd
        lab_folder=/home/sysadmin/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali
        lab_opts=ali-to-pdf
        lab_count_file=auto
        lab_data_folder=/home/sysadmin/kaldi/egs/timit/s5/data/train/
        lab_graph=/home/sysadmin/kaldi/egs/timit/s5/exp/tri3/graph
        

n_chunks = 5

[dataset2]
data_name = TIMIT_dev
fea = fea_name=mfcc
        fea_lst=/home/sysadmin/kaldi/egs/timit/s5/data/dev/feats.scp
        fea_opts=apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/dev/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
        cw_left=5
        cw_right=5
        

lab = lab_name=lab_cd
        lab_folder=/home/sysadmin/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_dev
        lab_opts=ali-to-pdf
        lab_count_file=auto
        lab_data_folder=/home/sysadmin/kaldi/egs/timit/s5/data/dev/
        lab_graph=/home/sysadmin/kaldi/egs/timit/s5/exp/tri3/graph
        

n_chunks = 1

[dataset3]
data_name = TIMIT_test
fea = fea_name=mfcc
        fea_lst=/home/sysadmin/kaldi/egs/timit/s5/data/test/feats.scp
        fea_opts=apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/test/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
        cw_left=5
        cw_right=5
        

lab = lab_name=lab_cd
        lab_folder=/home/sysadmin/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test
        lab_opts=ali-to-pdf
        lab_count_file=auto
        lab_data_folder=/home/sysadmin/kaldi/egs/timit/s5/data/test/
        lab_graph=/home/sysadmin/kaldi/egs/timit/s5/exp/tri3/graph
        

n_chunks = 1

hajime9652 avatar Sep 05 '19 06:09 hajime9652

data_name, data_end_index_fea, data_end_index_lab, lab_dict, data_set_dict is None. Especially why can not read lab_dict?

shared list [None, None, None, {'mfcc': ['mfcc', 'exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst', 'apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/test/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |', '5', '5']}, {}, {'MLP_layers1': ['architecture1', 'MLP_layers1', 0]}, {'input': None, 'ref': None}]

lab_folder

$ ls /home/sysadmin/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test
ali.1.gz  ali.2.gz  ali.3.gz  ali.4.gz  final.mdl  log  num_jobs  phones.txt  tree

exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0.cfg

[cfg_proto]
cfg_proto = proto/global.proto
cfg_proto_chunk = proto/global_chunk.proto

[exp]
cmd = 
run_nn_script = run_nn
out_folder = exp/TIMIT_MLP_basic
seed = 1257
use_cuda = False
multi_gpu = False
save_gpumem = False
n_epochs_tr = 24
production = False
to_do = forward
out_info = exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0.info

[batches]
batch_size_train = 128
max_seq_length_train = 1000
batch_size_valid = 128
max_seq_length_valid = 1000

[architecture1]
arch_name = MLP_layers1
arch_proto = proto/MLP.proto
arch_library = neural_networks
arch_class = MLP
arch_pretrain_file = exp/TIMIT_MLP_basic/exp_files/train_TIMIT_tr_ep23_ck4_architecture1.pkl
arch_freeze = False
arch_seq_model = False
dnn_lay = 1024,1024,1024,1024,1896
dnn_drop = 0.15,0.15,0.15,0.15,0.0
dnn_use_laynorm_inp = False
dnn_use_batchnorm_inp = False
dnn_use_batchnorm = True,True,True,True,False
dnn_use_laynorm = False,False,False,False,False
dnn_act = relu,relu,relu,relu,softmax
arch_lr = 0.08
arch_halving_factor = 0.5
arch_improvement_threshold = 0.001
arch_opt = sgd
opt_momentum = 0.0
opt_weight_decay = 0.0
opt_dampening = 0.0
opt_nesterov = False

[model]
model_proto = proto/model.proto
model = out_dnn1=compute(MLP_layers1,mfcc)
        loss_final=cost_nll(out_dnn1,lab_cd)
        err_final=cost_err(out_dnn1,lab_cd)

[forward]
forward_out = out_dnn1
normalize_posteriors = True
normalize_with_counts_from = exp/TIMIT_MLP_basic/exp_files/forward_out_dnn1_lab_cd.count
save_out_file = False
require_decoding = True

[data_chunk]
fea = fea_name=mfcc
        fea_lst=exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst
        fea_opts=apply-cmvn --utt2spk=ark:/home/sysadmin/kaldi/egs/timit/s5/data/test/utt2spk  ark:/home/sysadmin/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |
        cw_left=5
        cw_right=5
lab = lab_name=lab_cd
        lab_folder=/home/sysadmin/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test
        lab_opts=ali-to-pdf
        lab_count_file=auto
        lab_data_folder=/home/sysadmin/kaldi/egs/timit/s5/data/test/
        lab_graph=/home/sysadmin/kaldi/egs/timit/s5/exp/tri3/graph

hajime9652 avatar Sep 05 '19 06:09 hajime9652

Did you find a solution to this? I am having the exact same issue. Double checked all paths in my cfg file and the same error is occurring.

Note: I am using PyTorch-Kaldi on WSL without CUDA (still no CUDA support on WSL) not sure if this would make a difference.

spencerkirn avatar Oct 02 '19 13:10 spencerkirn

It looks like and error in reading features and labels with kaldi. To debug, you can try to "manually" read the features in this way: 1- select one ark file in the /mnt/mscteach_home/s1870525/dissertation/PruninNeuralNetworksSpeech/s5/data/test_dev93/feats.scp (e.g, quick_test/fbank/raw_fbank_dev.1.ark) 2- run copy-feats ark:your_ark_file.ark ark,t:- . If everything works you should see a lot of numbers is standard output. If it doesn't work, try to take a look into the error. 3- If it works, you can add the options and you can write: copy-feats ark:your_ark.ark ark:- | apply-cmvn --utt2spk=ark:/mnt/mscteach_home/s1870525/dissertation/PruninNeuralNetworksSpeech/s5/data/test_dev93/utt2spk ark:/mnt/mscteach_home/s1870525/dissertation/PruninNeuralNetworksSpeech/s5/data/test_dev93/data/cmvn_test_dev93.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark,t:- If it doesn't work, take a look into the error message. You can also try to take a look into the log.log file you find into the output folder.

Please, let me know if you are able to solve the data loading issue...

On Wed, 2 Oct 2019 at 09:29, spencerkirn [email protected] wrote:

Did you find a solution to this? I am having the exact same issue. Double checked all paths in my cfg file and the same error is occurring.

Note: I am using PyTorch-Kaldi on WSL without CUDA (still no CUDA support on WSL) not sure if this would make a difference.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/157?email_source=notifications&email_token=AEA2ZVTDF3BUVASYTBN5OV3QMSO3LA5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAEXV4A#issuecomment-537492208, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVU5Z7HOD7RZI763UTDQMSO3LANCNFSM4IRZSXTQ .

mravanelli avatar Oct 02 '19 15:10 mravanelli

Thank you for the quick reply. I apologize if these are basic questions, I am new to using Kaldi and this toolkit. So I ran copy-feats ark:/home/spencer/kaldi/egs/timit/s5/mfcc/raw_mfcc_dev.1.ark ark,t:- and it ran just like you said it should, with a lot of numbers output to the terminal. So after that I ran copy-feats ark:/home/spencer/kaldi/egs/timit/s5/mfcc/raw_mfcc_dev.1.ark ark:- | apply-cmvn --utt2spk=ark:/home/spencer/kaldi/egs/timit/data/dev/utt2spk ark:/home/spencer/kaldi/egs/timit/s5/data/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark, t:- and got the attached error. One thing I noticed is that there is no cmvn_dev.ark in my data folder (no .ark files at all in that folder) is that meant to be the output or should there be a .ark file there? Seems like the error is centered around that file.

TIMITError

spencerkirn avatar Oct 03 '19 13:10 spencerkirn

Does /home/spencer/kaldi/egs/timit/s5/data/cmvn_dev.ark exists?

Mirco

On Thu, 3 Oct 2019 at 09:24, spencerkirn [email protected] wrote:

Thank you for the quick reply. I apologize if these are basic questions, I am new to using Kaldi and this toolkit. So I ran copy-feats ark:/home/spencer/kaldi/egs/timit/s5/mfcc/raw_mfcc_dev.1.ark ark,t:- and it ran just like you said it should, with a lot of numbers output to the terminal. So after that I ran copy-feats ark:/home/spencer/kaldi/egs/timit/s5/mfcc/raw_mfcc_dev.1.ark ark:- | apply-cmvn --utt2spk=ark:/home/spencer/kaldi/egs/timit/data/dev/utt2spk ark:/home/spencer/kaldi/egs/timit/s5/data/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark, t:- and got the attached error. One thing I noticed is that there is no cmvn_dev.ark in my data folder (no .ark files at all in that folder) is that meant to be the output or should there be a .ark file there? Seems like the error is centered around that file.

[image: TIMITError] https://user-images.githubusercontent.com/49201733/66129779-8fc35180-e5be-11e9-8b3c-d0ea6a826948.PNG

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/157?email_source=notifications&email_token=AEA2ZVRWMRDPA6HIA2ECK3TQMXXCJA5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAIGC4A#issuecomment-537944432, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVSLWWHPNLZPMECNK23QMXXCJANCNFSM4IRZSXTQ .

mravanelli avatar Oct 03 '19 13:10 mravanelli

No like I said there are not .ark files in that folder (or subfolders). I thought this might be an output folder, but it looks like the issue is in the creation of those files.

spencerkirn avatar Oct 03 '19 13:10 spencerkirn

This cmvn file is created by kaldi during the feature extraction phase and it performs mean and variance normalization. You should probably have the cmvn file somewhere else like in data/dev/cmvn* or mfcc/cmv*

Mirco

On Oct 3, 2019 09:36, "spencerkirn" [email protected] wrote:

No like I said there are not .ark files in that folder (or subfolders). I thought this might be an output folder, but it looks like the issue is in the creation of those files.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/157?email_source=notifications&email_token=AEA2ZVREGRGKV5LM674GLATQMXYODA5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAIHFCY#issuecomment-537948811, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVQZEBDVBCBQJ4SVEDDQMXYODANCNFSM4IRZSXTQ .

mravanelli avatar Oct 03 '19 14:10 mravanelli

Yea I had the wrong path for cmvn file, but when I run copy-feats ark:/home/spencer/kaldi/egs/timit/s5/mfcc/raw_mfcc_test.1.ark ark,t:- | apply-cmvn --utt2spk=ark:/home/spencer/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/spencer/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- now I get a Kaldi Fatal error TIMITError2

spencerkirn avatar Oct 09 '19 19:10 spencerkirn

In case anyone else has this issue: I resolved it by bypassing the if statement on line 328 of run_exp.py. There was some issue in how the shared_list object was being created that I could not figure out, but the else statement ran the run_nn function in a similar fashion as the training and validation steps.

So I commented out line 328 and created another variable set to False to bypass that if statement.

test=False #if _run_forwarding_in_subprocesses(config) if test:

spencerkirn avatar Oct 25 '19 12:10 spencerkirn

This is weird, are you sure that you don't have a path problem only?

On Fri, 25 Oct 2019 at 08:54, spencerkirn [email protected] wrote:

In case anyone else has this issue: I resolved it by bypassing the if statement on line 328 of run_exp.py. There was some issue in how the shared_list object was being created that I could not figure out, but the else statement ran the run_nn function in a similar fashion as the training and validation steps.

So I commented out line 328 and created another variable set to False to bypass that if statement. test=False #if _run_forwarding_in_subprocesses(config) if test:

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/157?email_source=notifications&email_token=AEA2ZVSA33MYYYZIFPEVEGLQQLT65A5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECIIHSQ#issuecomment-546341834, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZVSLDBJZY4BCVRFHB3TQQLT65ANCNFSM4IRZSXTQ .

mravanelli avatar Oct 25 '19 14:10 mravanelli

Yes, I checked all the paths in the config file and they were all correct. Bypassing that if statement though gave a result that looked very similar to the one in the tutorial.

TIMITResult

spencerkirn avatar Oct 25 '19 15:10 spencerkirn

Interesting, we haven't experimented this issue on our side.

On Fri, 25 Oct 2019 at 11:07, spencerkirn [email protected] wrote:

Yes, I checked all the paths in the config file and they were all correct. Bypassing that if statement though gave a result that looked very similar to the one in the tutorial.

[image: TIMITResult] https://user-images.githubusercontent.com/49201733/67582253-6ad28200-f717-11e9-9d6e-40d0d73a7744.PNG

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/157?email_source=notifications&email_token=AEA2ZVXGWLJEAJNF6KUJHWDQQMDT5A5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECIUSZI#issuecomment-546392421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZVQWJTWPJJDJDTNX2N3QQMDT5ANCNFSM4IRZSXTQ .

mravanelli avatar Oct 25 '19 15:10 mravanelli

There is still an error in the log.log file apparently (I had not check that file when I got the correct result). Something to do with decode_dnn.sh. Looks like the forward_TIMIT_test_ep*_ck*_out_dnn1_to_decode.ark files are not being created for some reason. Though for whatever reason this does not affect the outcome it seems. TIMITError3

spencerkirn avatar Oct 25 '19 16:10 spencerkirn

Maybe this file has not been created because there is a problem with test data. Could you better check them?

Mirco

https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Fri, 25 Oct 2019 at 12:19, spencerkirn [email protected] wrote:

There is still an error in the log.log file apparently (I had not check that file when I got the correct result). Something to do with decode_dnn.sh. Looks like the forward_TIMIT_test_ep*_ck*_out_dnn1_to_decode.ark files are not being created for some reason. Though for whatever reason this does not affect the outcome it seems. [image: TIMITError3] https://user-images.githubusercontent.com/49201733/67587312-965a6a00-f721-11e9-8b54-54dcbcebeef6.PNG

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/157?email_source=notifications&email_token=AEA2ZVQFPWFAZX3MKBOWJTDQQMMABA5CNFSM4IRZSXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECI27SQ#issuecomment-546418634, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZVWADWVSFXVKNZ4NFU3QQMMABANCNFSM4IRZSXTQ .

mravanelli avatar Oct 25 '19 17:10 mravanelli

I am also having error at testing phase

------------------------------ Epoch 23 / 23 ------------------------------
 
----- Summary epoch 23 / 23
Training on ['TIMIT_tr']
Loss = 0.916 | err = 0.290 
-----
Validating on TIMIT_dev
Loss = 1.674 | err = 0.450 
-----
Learning rate on architecture1 = 0.0025 
-----
Elapsed time (s) = 3338

 
Testing TIMIT_test chunk = 1 / 1
Traceback (most recent call last):
  File "run_exp.py", line 475, in <module>
    data_set_inp, data_set_ref = convert_numpy_to_torch(data_set_dict, save_gpumem, use_cuda)
  File "/home/dev_ds/pytorch-kaldi/core.py", line 53, in convert_numpy_to_torch
    data_set_inp = torch.from_numpy(data_set_dict["input"]).float()
TypeError: expected np.ndarray (got NoneType)

when printed shared_list [print(shared_list)] in run_exp.py, looks as below.

[None, None, None, {'mfcc': ['mfcc', '/home/dev_ds/kaldi_dnn/egs/timit/s5/exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst', 'apply-cmvn --utt2spk=ark:/home/dev_ds/kaldi_dnn/egs/timit/s5/data/test/utt2spk ark:/home/dev_ds/kaldi_dnn/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |', '5', '5']}, {}, {'MLP_layers1': ['architecture1', 'MLP_layers1', 0]}, {'input': None, 'ref': None}]

I used same validation data [dev] as test data, training and validation have no errors, but testing with same data throwing error.

kumarh22 avatar Nov 25 '19 07:11 kumarh22

@kumarh22 I got the same problem with you, have you solved?

zhang7346 avatar Dec 27 '19 07:12 zhang7346

@mravanelli I also got the error in test phase.

Testing TIMIT_test chunk = 1 / 1 info [None, None, None, {'mfcc': ['mfcc', 'exp/TIMIT_MLP_basic/exp_files/forward_TIMIT_test_ep23_ck0_mfcc.lst', 'apply-cmvn --utt2spk=ark:/home/zhang/code/kaldi_maked/egs/timit/s5/data/dev/utt2spk ark:/home/zhang/code/kaldi_maked/egs/timit/s5/mfcc/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |', '5', '5']}, {}, {'MLP_layers1': ['architecture1', 'MLP_layers1', 0]}, {'input': None, 'ref': None}] Traceback (most recent call last): File "run_exp.py", line 476, in data_set_inp, data_set_ref = convert_numpy_to_torch(data_set_dict, save_gpumem, use_cuda) File "/data00/home/zhang/code/pytorch-kaldi/core.py", line 53, in convert_numpy_to_torch data_set_inp = torch.from_numpy(data_set_dict["input"]).float() TypeError: expected np.ndarray (got NoneType)

I had "manually" read the features to debug as you said above. It works in step2, and not came into error in step3(for step3, it runs for such a long time but without error, this is the same with eval file) and the log.log is just prov dopo prima ps. I am using python3.7, torch 1.0 cpu only version could you help me

zhang7346 avatar Dec 27 '19 07:12 zhang7346

Is the problem happening if you use the validation or training set as the test set?

TParcollet avatar Dec 27 '19 09:12 TParcollet

Is the problem happening if you use the validation or training set as the test set?

yes. I use the validation set as test set, but it still happen

zhang7346 avatar Dec 27 '19 13:12 zhang7346

Is the problem happening if you use the validation or training set as the test set?

yes. I use the validation set as test set, but it still happen

I find that when I use gpu version, the problem not appear again.

zhang7346 avatar Jan 06 '20 10:01 zhang7346

Had the same issue today. Here are some findings:

Why does it only happen when running on CPU?

Because when CPU is used the forward will run in a subprocess and the method to run forward pass in a subprocess uses another version of read_lab_fea method here - read_lab_fea_refac01. While the same process forward pass will use the original read_lab_fea method.

So why it crashes when using the read_lab_fea_refac01 method?

First of all, because it will switch to production mode when reading fea_dict, lab_dict, arch_dict. By removing this line I fixed the initial issue. But there is another problem. It will also return -1 as data_end_index and run_nn will crash anyway.

How to fix:

You can update this method to return False. I tried to use read_lab instead of read_lab_fea_refac01 here but it will crash anyway when trying to unpack the shared_list here. The shared_list has 6 items, not 7. There is only one item for the data_end_index data.

Serhiy-Shekhovtsov avatar May 29 '20 10:05 Serhiy-Shekhovtsov