icefall
icefall copied to clipboard
Can I use the fbank features already extracted by Kaldi to train with Icefall script?
For most speech datasets, we have already extracted their fbank features by compute-fbank-feats
of Kaldi. Is it possible to generate the (dataset_name)_cuts_train.jsonl.gz
directly using Kaldi's various List ( wav.scp, utt2spk, spk2utt and etc.) and fbank in ark format? The training scripts in icefall are strongly related to the feature processing of lhotse. It may cause repeated extraction of fbank features from the same dataset?
Thanks!
Of course, you can. Please see https://lhotse.readthedocs.io/en/latest/kaldi.html
https://github.com/lhotse-speech/lhotse/blob/master/lhotse/kaldi.py
From https://lhotse.readthedocs.io/en/latest/kaldi.html
# Convert data/train to train_manifests/{recordings,supervisions}.json
lhotse kaldi import \
data/train \
16000 \
train_manifests
# Convert train_manifests/{recordings,supervisions}.json to data/train
lhotse kaldi export \
train_manifests/recordings.json \
train_manifests/supervisions.json \
data/train
Thanks a lot~ I'll try it!
I have trained transducer model with the hand-crafted cuts.jsonl.gz following the scripts (https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2). But when decoding use the command
./pruned_transducer_stateless2/decode.py \ --simulate-streaming 1 \ --bpe-model ./data/lang_bpe_5000/bpe.model \ --decode-chunk-size 16 \ --causal-convolution 1 \ --epoch 22 \ --avg 10 \ --exp-dir ./pruned_transducer_stateless2/exp \ --max-sym-per-frame 1 \ --max-duration 100 \ --decoding-method greedy_search
The hyp is the same for all speeches. I checked that each speech has different features as input, but they all output the same probabilities on vocabulary list. So what might cause that?
Here I use the following steps to generate final cuts.jsonl.gz.
step1: lhotse kaldi import -f 0.01\ ${librispeech_data}/test-clean \ 16000 \ data/manifests/${name}
To generate features.jsonl.gz, recordings.jsonl.gz and supervisions.jsonl.gz
step2: run this function def compute_fbank_librispeech(): src_dir = Path("data/manifests") output_dir = Path("data/fbank") num_jobs = min(15, os.cpu_count()) num_mel_bins = 80
dataset_parts = (
"test-clean",
)
prefix = "librispeech"
suffix = "jsonl.gz"
manifests = read_manifests_if_cached(
dataset_parts=dataset_parts,
output_dir=src_dir,
prefix=prefix,
suffix=suffix,
types=("recordings", "supervisions", "features")
)
assert manifests is not None
with get_executor() as ex: # Initialize the executor only once.
for partition, m in manifests.items():
cuts_filename = f"{prefix}_cuts_{partition}.{suffix}"
if (output_dir / cuts_filename).is_file():
logging.info(f"{partition} already exists - skipping.")
continue
logging.info(f"Processing {partition}")
cut_set = CutSet.from_manifests(
recordings=m["recordings"],
features=m["features"],
supervisions=m["supervisions"],
)
cut_set.to_file(output_dir / cuts_filename)
But it seems wrong when I use https://huggingface.co/pkufool/icefall_librispeech_streaming_pruned_transducer_stateless2_20220625/blob/main/exp/pretrained-epoch-24-avg-10.pt
provided by yours to decode test_cuts generated as described above.
But when decoding use the command
./pruned_transducer_stateless2/decode.py \ --simulate-streaming 1 \ --bpe-model ./data/lang_bpe_5000/bpe.model \ --decode-chunk-size 16 \ --causal-convolution 1 \ --epoch 22 \ --avg 10 \ --exp-dir ./pruned_transducer_stateless2/exp \ --max-sym-per-frame 1 \ --max-duration 100 \ --decoding-method greedy_search
@Aurora-6
Are you using features generated by kaldi to train the model but using features generated by lhotse to test the trained model?
If that is the case, you won't get the expected recognition results.
Kaldi uses samples in the range [-32768, 32767) to extract the features, while lhotse uses samples in the range [-1, 1).
I suggest that you also use features from kaldi to test decode.py
.
But when decoding use the command ./pruned_transducer_stateless2/decode.py \ --simulate-streaming 1 \ --bpe-model ./data/lang_bpe_5000/bpe.model \ --decode-chunk-size 16 \ --causal-convolution 1 \ --epoch 22 \ --avg 10 \ --exp-dir ./pruned_transducer_stateless2/exp \ --max-sym-per-frame 1 \ --max-duration 100 \ --decoding-method greedy_search
@Aurora-6
Are you using features generated by kaldi to train the model but using features generated by lhotse to test the trained model?
If that is the case, you won't get the expected recognition results.
Kaldi uses samples in the range [-32768, 32767) to extract the features, while lhotse uses samples in the range [-1, 1).
I suggest that you also use features from kaldi to test
decode.py
.
@csukuangfj I have the similar issue that both wav and kaldi extracted features(fbank80) in my training corpus. I want make generating fbank80 from lhotse same with kaldi, what should I consider about, except different sample range, any other suggestion?
except different sample range, any other suggestion?
Using unnormalized samples for Kaldi is the only thing that I can think of.
I also can confirm that I have had a successful experience with using Kaldi features for the training dataset and kaldifeat
with preliminary scaling to [-32768, 32767) for test datsets (there was a very small ~1% relative degradation in WER comparing to Kaldi features for test datsets).
I also can confirm that I have had a successful experience with using Kaldi features for the training dataset and
kaldifeat
with preliminary scaling to [-32768, 32767) for test datsets (there was a very small ~1% relative degradation in WER comparing to Kaldi features for test datsets).
Thanks for the information.
Dither is enabled by default in kaldi. Do you also use dither with kaldifeat?
I also can confirm that I have had a successful experience with using Kaldi features for the training dataset and kaldifeat with preliminary scaling to [-32768, 32767) for test datsets (there was a very small ~1% relative degradation in WER comparing to Kaldi features for test datsets).
Thanks for the information.
Dither is enabled by default in kaldi. Do you also use dither with kaldifeat?
I didn't change the default behavior, so it's supposed to enabled by default? https://github.com/csukuangfj/kaldifeat/blob/72aa5eab2b60ba1c3dc4b60be476eaf1d7816f71/kaldifeat/python/tests/test_fbank_options.py#L18