SOME icon indicating copy to clipboard operation
SOME copied to clipboard

transcription

Open dutchsing009 opened this issue 1 year ago • 5 comments

Can you please give an example of a transcriptions.csv file with name ph_seq, ph_dur and ph_num in it.

I want to see a reference file.

dutchsing009 avatar Nov 19 '23 21:11 dutchsing009

If you have ever made DiffSinger datasets you should be familiar with transcriptions.csv. See https://github.com/openvpi/MakeDiffSinger, if you haven't done that before and want to learn more details. There is also a link to this SOME repository in https://github.com/openvpi/MakeDiffSinger/tree/main/variance-temp-solution, and you can understand everything once you reach that step.

yqzhishen avatar Nov 20 '23 04:11 yqzhishen

1- Does this variance temp solution link work for English or French datasets ?

Ok Thanks , So if I understand this correctly , if I have ph_seq ph_dur ph_num I can Use SOME to get the midi sequence and midi duration sequence ? if yes I have 2 Questions

1- How can I obtain those 3 ph_seq , _dur, _num.? I saw 2 tools but I'm not sure if they will obtain those 3! https://github.com/wolfgitpr/LyricFA https://github.com/Anjiurine/fast-phasr-next Is there any other tool that will automatically generate me the Phoneme Sequence| Phoneme duration Sequence|Phoneme num?

2- How accurate are the generated midi sequence and midi duration sequence going to be ? like 100% ? ( I'm asking as if it isn't 100%, I think it will make the model hallucinate during SVS inference )

dutchsing009 avatar Nov 20 '23 09:11 dutchsing009

  1. ph_seq and ph_dur should be obtained when you finished making your DiffSinger acoustic dataset. Many tools and pipelines can do this. But as far as I know, ph_num can only be obtained by the method described in MakeDiffSinger repository, and unfortunately, there are no proper method of automatic ph_num inference for polysyllabic languages like English and French yet. However, I already have an idea to do this as described in https://github.com/openvpi/MakeDiffSinger/issues/11. If you have some suggestions you can comment on that issue.
  2. The pretrained model of SOME is trained on pure Chinese datasets. Though SOME is language-irrelevant, it may not produce as good results as on its "native" language. But we do benefit from it for reducing the time cost of manual MIDI labeling, because of its ability to recognize slur notes and generate cent-level MIDI values.

yqzhishen avatar Nov 20 '23 13:11 yqzhishen

does this help ? https://github.com/colstone/ENG_dur_num

dutchsing009 avatar Nov 27 '23 15:11 dutchsing009

Yes, this can help, in some degree. But I doubt if simply specifying all vowels is enough and proper for polysyllabic languages. A more detailed discussion was raised here: https://github.com/openvpi/MakeDiffSinger/discussions/12

yqzhishen avatar Nov 28 '23 18:11 yqzhishen

I think an examples/transcription.csv is a no-brainer...

The format itself seems to vary based on the method and pipeline.

godofecht avatar Oct 11 '24 12:10 godofecht