lhotse [WIP ]add normalization for text of mulit_cn recipe:thchs_30, tal_csasr, tal

I will clean some Punctuation, and convert full-width English characters into half-width English characters in this recipe.

Jun 28 '22 03:06 shanguanma

LGTM. Can you first fix the formatting issues? Don't worry about unit test, it was a randomness-related error.

Jul 01 '22 21:07 pzelasko

@pzelasko , sorry for the late reply, I have run your test command pytest test, however when I execute the command , it is very slow , the running logging is as follows:

============================= test session starts ==============================
platform linux -- Python 3.8.13, pytest-7.1.2, pluggy-1.0.0
rootdir: /mntnfs/lee_data1/maduo/k2-fsa/lhotse
plugins: hypothesis-5.41.2, anyio-3.6.1
collected 1434 items / 3 skipped

test/test_audio_reads.py ....FFFFF......FF......FF......FF......FFFFFxxF [  3%]
FxxFFxxFFFF..                                                            [  4%]
test/test_feature_set.py ...........FFFFssss..........                   [  6%]
test/test_kaldi_dirs.py x.FF................                             [  7%]
test/test_lazy.py ..............................x..x                     [  9%]
test/test_manipulation.py .............................................. [ 13%]
............                                                             [ 14%]
test/test_multipexing_iterables.py .......                               [ 14%]
test/test_parallel.py ....                                               [ 14%]
test/test_qa.py ....                                                     [ 15%]
test/test_recording_set.py ....F...........FFFFF........................ [ 18%]
..........ssss...                                                        [ 19%]
test/test_resample_randomized.py .                                       [ 19%]
test/test_serialization.py ............................................. [ 22%]
.........................................................                [ 26%]
test/test_supervision_set.py ...........................                 [ 28%]
test/test_utils.py ....................................................  [ 32%]
test/augmentation/test_torchaudio.py .................................   [ 34%]
test/cut/test_custom_attrs.py ...................                        [ 35%]
test/cut/test_custom_attrs_randomized.py .                               [ 35%]
test/cut/test_cut.py ............................                        [ 37%]
test/cut/test_cut_augmentation.py ...................................... [ 40%]
..                                                                       [ 40%]
test/cut/test_cut_drop_attributes.py ............                        [ 41%]
test/cut/test_cut_extend_by.py .................                         [ 42%]
test/cut/test_cut_fill_supervision.py ..............                     [ 43%]
test/cut/test_cut_merge_supervisions.py .........                        [ 44%]
test/cut/test_cut_mixing.py ...............                              [ 45%]
test/cut/test_cut_ops_preserve_id.py ................................... [ 47%]
.....                                                                    [ 47%]
test/cut/test_cut_set.py ............s.........s.......                  [ 50%]
test/cut/test_cut_set_mix.py .........                                   [ 50%]
test/cut/test_cut_trim_to_supervisions.py .....                          [ 51%]
test/cut/test_cut_truncate.py ........................................   [ 53%]
test/cut/test_cut_with_in_memory_data.py ...........                     [ 54%]
test/cut/test_feature_extraction.py ............ssss....sss........sss.  [ 57%]
test/cut/test_invariants_randomized.py ..                                [ 57%]
test/cut/test_masks.py ................                                  [ 58%]
test/cut/test_padding_cut.py ........................................... [ 61%]
......                                                                   [ 61%]
test/dataset/test_batch_io.py ............

Aug 18 '22 01:08 shanguanma

It is running for more than 12 hours. however, it is not finished. I don't know how to do it

Aug 18 '22 02:08 shanguanma

As far as I remember, the unit tests failed on some test that used random numbers and can crash very rarely; I will fix that separately, some time. Can you resolve the conflicts and then run black lhotse test on your code? It should be good enough.

Aug 18 '22 12:08 pzelasko

Thanks, can you also merge master and resolve the conflicts?

Aug 22 '22 12:08 pzelasko

@pzelasko, if it has no conflicts and problems, please merge it, I will open another pull request to add normalization for the aishell2 recipe.

Aug 23 '22 02:08 shanguanma

Thanks, merging!

Aug 23 '22 12:08 pzelasko

lhotse
lhotse copied to clipboard

[WIP ]add normalization for text of mulit_cn recipe:thchs_30, tal_csasr, tal_asr, aishell, aishell2,etc

lhotse lhotse copied to clipboard

[WIP ]add normalization for text of mulit_cn recipe:thchs_30, tal_csasr, tal_asr, aishell, aishell2,etc

lhotse
lhotse copied to clipboard