lhotse icon indicating copy to clipboard operation
lhotse copied to clipboard

[WIP ]add normalization for text of mulit_cn recipe:thchs_30, tal_csasr, tal_asr, aishell, aishell2,etc

Open shanguanma opened this issue 2 years ago • 1 comments

I will clean some Punctuation, and convert full-width English characters into half-width English characters in this recipe.

shanguanma avatar Jun 28 '22 03:06 shanguanma

LGTM. Can you first fix the formatting issues? Don't worry about unit test, it was a randomness-related error.

pzelasko avatar Jul 01 '22 21:07 pzelasko

@pzelasko , sorry for the late reply, I have run your test command pytest test, however when I execute the command , it is very slow , the running logging is as follows:

============================= test session starts ==============================
platform linux -- Python 3.8.13, pytest-7.1.2, pluggy-1.0.0
rootdir: /mntnfs/lee_data1/maduo/k2-fsa/lhotse
plugins: hypothesis-5.41.2, anyio-3.6.1
collected 1434 items / 3 skipped

test/test_audio_reads.py ....FFFFF......FF......FF......FF......FFFFFxxF [  3%]
FxxFFxxFFFF..                                                            [  4%]
test/test_feature_set.py ...........FFFFssss..........                   [  6%]
test/test_kaldi_dirs.py x.FF................                             [  7%]
test/test_lazy.py ..............................x..x                     [  9%]
test/test_manipulation.py .............................................. [ 13%]
............                                                             [ 14%]
test/test_multipexing_iterables.py .......                               [ 14%]
test/test_parallel.py ....                                               [ 14%]
test/test_qa.py ....                                                     [ 15%]
test/test_recording_set.py ....F...........FFFFF........................ [ 18%]
..........ssss...                                                        [ 19%]
test/test_resample_randomized.py .                                       [ 19%]
test/test_serialization.py ............................................. [ 22%]
.........................................................                [ 26%]
test/test_supervision_set.py ...........................                 [ 28%]
test/test_utils.py ....................................................  [ 32%]
test/augmentation/test_torchaudio.py .................................   [ 34%]
test/cut/test_custom_attrs.py ...................                        [ 35%]
test/cut/test_custom_attrs_randomized.py .                               [ 35%]
test/cut/test_cut.py ............................                        [ 37%]
test/cut/test_cut_augmentation.py ...................................... [ 40%]
..                                                                       [ 40%]
test/cut/test_cut_drop_attributes.py ............                        [ 41%]
test/cut/test_cut_extend_by.py .................                         [ 42%]
test/cut/test_cut_fill_supervision.py ..............                     [ 43%]
test/cut/test_cut_merge_supervisions.py .........                        [ 44%]
test/cut/test_cut_mixing.py ...............                              [ 45%]
test/cut/test_cut_ops_preserve_id.py ................................... [ 47%]
.....                                                                    [ 47%]
test/cut/test_cut_set.py ............s.........s.......                  [ 50%]
test/cut/test_cut_set_mix.py .........                                   [ 50%]
test/cut/test_cut_trim_to_supervisions.py .....                          [ 51%]
test/cut/test_cut_truncate.py ........................................   [ 53%]
test/cut/test_cut_with_in_memory_data.py ...........                     [ 54%]
test/cut/test_feature_extraction.py ............ssss....sss........sss.  [ 57%]
test/cut/test_invariants_randomized.py ..                                [ 57%]
test/cut/test_masks.py ................                                  [ 58%]
test/cut/test_padding_cut.py ........................................... [ 61%]
......                                                                   [ 61%]
test/dataset/test_batch_io.py ............

shanguanma avatar Aug 18 '22 01:08 shanguanma

It is running for more than 12 hours. however, it is not finished. I don't know how to do it

shanguanma avatar Aug 18 '22 02:08 shanguanma

As far as I remember, the unit tests failed on some test that used random numbers and can crash very rarely; I will fix that separately, some time. Can you resolve the conflicts and then run black lhotse test on your code? It should be good enough.

pzelasko avatar Aug 18 '22 12:08 pzelasko

Thanks, can you also merge master and resolve the conflicts?

pzelasko avatar Aug 22 '22 12:08 pzelasko

@pzelasko, if it has no conflicts and problems, please merge it, I will open another pull request to add normalization for the aishell2 recipe.

shanguanma avatar Aug 23 '22 02:08 shanguanma

Thanks, merging!

pzelasko avatar Aug 23 '22 12:08 pzelasko