lhotse
lhotse copied to clipboard
[WIP ]add normalization for text of mulit_cn recipe:thchs_30, tal_csasr, tal_asr, aishell, aishell2,etc
I will clean some Punctuation, and convert full-width English characters into half-width English characters in this recipe.
LGTM. Can you first fix the formatting issues? Don't worry about unit test, it was a randomness-related error.
@pzelasko , sorry for the late reply, I have run your test command pytest test
, however when I execute the command , it is very slow , the running logging is as follows:
============================= test session starts ==============================
platform linux -- Python 3.8.13, pytest-7.1.2, pluggy-1.0.0
rootdir: /mntnfs/lee_data1/maduo/k2-fsa/lhotse
plugins: hypothesis-5.41.2, anyio-3.6.1
collected 1434 items / 3 skipped
test/test_audio_reads.py ....FFFFF......FF......FF......FF......FFFFFxxF [ 3%]
FxxFFxxFFFF.. [ 4%]
test/test_feature_set.py ...........FFFFssss.......... [ 6%]
test/test_kaldi_dirs.py x.FF................ [ 7%]
test/test_lazy.py ..............................x..x [ 9%]
test/test_manipulation.py .............................................. [ 13%]
............ [ 14%]
test/test_multipexing_iterables.py ....... [ 14%]
test/test_parallel.py .... [ 14%]
test/test_qa.py .... [ 15%]
test/test_recording_set.py ....F...........FFFFF........................ [ 18%]
..........ssss... [ 19%]
test/test_resample_randomized.py . [ 19%]
test/test_serialization.py ............................................. [ 22%]
......................................................... [ 26%]
test/test_supervision_set.py ........................... [ 28%]
test/test_utils.py .................................................... [ 32%]
test/augmentation/test_torchaudio.py ................................. [ 34%]
test/cut/test_custom_attrs.py ................... [ 35%]
test/cut/test_custom_attrs_randomized.py . [ 35%]
test/cut/test_cut.py ............................ [ 37%]
test/cut/test_cut_augmentation.py ...................................... [ 40%]
.. [ 40%]
test/cut/test_cut_drop_attributes.py ............ [ 41%]
test/cut/test_cut_extend_by.py ................. [ 42%]
test/cut/test_cut_fill_supervision.py .............. [ 43%]
test/cut/test_cut_merge_supervisions.py ......... [ 44%]
test/cut/test_cut_mixing.py ............... [ 45%]
test/cut/test_cut_ops_preserve_id.py ................................... [ 47%]
..... [ 47%]
test/cut/test_cut_set.py ............s.........s....... [ 50%]
test/cut/test_cut_set_mix.py ......... [ 50%]
test/cut/test_cut_trim_to_supervisions.py ..... [ 51%]
test/cut/test_cut_truncate.py ........................................ [ 53%]
test/cut/test_cut_with_in_memory_data.py ........... [ 54%]
test/cut/test_feature_extraction.py ............ssss....sss........sss. [ 57%]
test/cut/test_invariants_randomized.py .. [ 57%]
test/cut/test_masks.py ................ [ 58%]
test/cut/test_padding_cut.py ........................................... [ 61%]
...... [ 61%]
test/dataset/test_batch_io.py ............
It is running for more than 12 hours. however, it is not finished. I don't know how to do it
As far as I remember, the unit tests failed on some test that used random numbers and can crash very rarely; I will fix that separately, some time. Can you resolve the conflicts and then run black lhotse test
on your code? It should be good enough.
Thanks, can you also merge master and resolve the conflicts?
@pzelasko, if it has no conflicts and problems, please merge it, I will open another pull request to add normalization for the aishell2 recipe.
Thanks, merging!