Amphion
Amphion copied to clipboard
[BUG]: 'NS2Trainer' object has no attribute '_count_parameters'
https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_trainer.py#L134-L136
Traceback (most recent call last):
File "E:\00\Amphion-main_old\bins\tts\train.py", line 130, in
https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/egs/tts/NaturalSpeech2/exp_config.json#L11 I think it is false.
I changes the __count_parameters(model) in TTSTrainer class, to _count_parameters(model) @a897456
I changes the __count_parameters(model) in TTSTrainer class, to _count_parameters(model) @a897456
Yes, _dump_cfg
is also.
And do you ever met : FileNotFoundError: [Errno 2] No such file or directory: 'data\libritts\code\19\train-clean-100#19#198#19_198_000004_000000.npy' in there:
https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L187-L196
yes, also _dump_cfg.
yeah. @a897456 I changed self.cfg.preprocess.read_metadata: to False, and used acoustic_extractor to create this files:
# code
code = np.load(self.utt2code_path[utt])
# frame_nums
frame_nums = code.shape[1]
# pitch
pitch = np.load(self.utt2pitch_path[utt])
# duration
duration = np.load(self.utt2duration_path[utt])
# phone_id
phone_id = np.array(
[
*map(
self.phone2id.get,
self.utt2phone[utt].replace("{", "").replace("}", "").split(),
)
]
)
yeah. @a897456 I changed self.cfg.preprocess.read_metadata: to False, and used acoustic_extractor to create this files:
used acoustic_extractor to create this files? How?
# code code = np.load(self.utt2code_path[utt]) # frame_nums frame_nums = code.shape[1] # pitch pitch = np.load(self.utt2pitch_path[utt]) # duration duration = np.load(self.utt2duration_path[utt]) # phone_id phone_id = np.array( [ *map( self.phone2id.get, self.utt2phone[utt].replace("{", "").replace("}", "").split(), ) ] )
This is the code in which part of if self.cfg.preprocess.read_metadata
is false, so can you show the code of how use acoustic_extractor
to create this files?
code:
if cfg.preprocess.extract_acoustic_token:
print('extract_acoustic_token')
if cfg.preprocess.acoustic_token_extractor == "Encodec":
codes = extract_encodec_token(wav_path)
save_feature(
dataset_output, cfg.preprocess.acoustic_token_dir, uid, codes
)
pitch:
if cfg.preprocess.extract_pitch:
pitch = f0.get_f0(wav, cfg.preprocess)
save_feature(dataset_output, cfg.preprocess.pitch_dir, uid, pitch)
if cfg.preprocess.extract_uv:
assert isinstance(pitch, np.ndarray)
uv = pitch != 0
save_feature(dataset_output, cfg.preprocess.uv_dir, uid, uv)
phones:
from g2p_en import G2p
preprocess_english(res["Text"], lexicon, g2p)
@a897456
THS,but :AttributeError: 'list' object has no attribute 'replace' https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L224-L230
Set True in cfg
@a897456
assert cfg.preprocess.use_phone == True
if cfg.preprocess.use_phone:
self.utt2phone = {}
for utt_info in self.metadata:
dataset = utt_info["Dataset"]
uid = utt_info["Uid"]
utt = "{}_{}".format(dataset, uid)
self.utt2phone[utt] = utt_info["phones"]
Set True in cfg @a897456
assert cfg.preprocess.use_phone == True if cfg.preprocess.use_phone: self.utt2phone = {} for utt_info in self.metadata: dataset = utt_info["Dataset"] uid = utt_info["Uid"] utt = "{}_{}".format(dataset, uid) self.utt2phone[utt] = utt_info["phones"]
Yes, and I changed the phone_id =...
phone_id = np.array( [ *map( self.phone2id.get, self.utt2phone[utt].replace("{", "").replace("}", "").split(), ) ] )
https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L308-L313
in this code:
phone_nums =len(phone_id)=len(tensor(1,X))=1, so phone_nums always=1
because: phone_id = torch.from_numpy(phone_id).unsqueeze(0)
so clip_phone_nums=1
but assert clip_phone_nums < phone_nums and clip_phone_nums >= 1
How to solve it,please?
https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L308-L313
in this code:
phone_nums =len(phone_id)=len(tensor(1,X))=1, so phone_nums always=1
because: phone_id = torch.from_numpy(phone_id).unsqueeze(0)
so clip_phone_nums=1
but assert clip_phone_nums < phone_nums and clip_phone_nums >= 1
How to solve it,please?
Hi,@a897456 I meet the same problem and can't fix it,have you solved the problem? Any advice will be appreciated!!!
Hi,@a897456 I meet the same problem and can't fix it,have you solved the problem? Any advice will be appreciated!!!
我在群里看到你问了,这个BUG作者应该还没修复。
Hi,@a897456 I meet the same problem and can't fix it,have you solved the problem? Any advice will be appreciated!!!
我在群里看到你问了,这个BUG作者应该还没修复。 方便的话可以在群里加个微信吗,想交流学习一下
@CreepJoye and @a897456 Have you fix all these bugs?
Have you fix all these bugs? No, I made some changes but there are still some issues. I'm working on finding a solution. Do you have any thoughts?
@CreepJoye and @a897456 Have you fix all these bugs?
@chazo1994 Have you fix all these bugs? I have been modifying the code, but new issues keep arising. If it's convenient, could we exchange contact information to discuss NS2 training?
@CreepJoye Not yet, I have fixed a lot of bug, but there is still an error in the code extract (Encodec) which may not be implemented. I push my code in this fork:https://github.com/chazo1994/Amphion
You can contact me with my email [email protected] or my linkedin: https://www.linkedin.com/in/thinh-nguyen-a06658133/ or any platform that you used such as discord. I would be honored if we could discuss Neuralspeech2, Neuralspeech3 or any SOTA Speech generation model.