Amphion icon indicating copy to clipboard operation
Amphion copied to clipboard

[BUG]: 'NS2Trainer' object has no attribute '_count_parameters'

Open a897456 opened this issue 10 months ago • 18 comments

https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_trainer.py#L134-L136

Traceback (most recent call last): File "E:\00\Amphion-main_old\bins\tts\train.py", line 130, in main() File "E:\00\Amphion-main_old\bins\tts\train.py", line 104, in main trainer = build_trainer(args, cfg) File "E:\00\Amphion-main_old\bins\tts\train.py", line 26, in build_trainer trainer = trainer_class(args, cfg)#NS2Trainer File "E:\00\Amphion-main_old\models\tts\naturalspeech2\ns2_trainer.py", line 135, in init f"Model parameters: {self._count_parameters(self.model)/1e6:.2f}M" AttributeError: 'NS2Trainer' object has no attribute '_count_parameters'

a897456 avatar Apr 13 '24 11:04 a897456

https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/egs/tts/NaturalSpeech2/exp_config.json#L11 I think it is false.

a897456 avatar Apr 13 '24 12:04 a897456

I changes the __count_parameters(model) in TTSTrainer class, to _count_parameters(model) @a897456

netagl avatar Apr 14 '24 10:04 netagl

I changes the __count_parameters(model) in TTSTrainer class, to _count_parameters(model) @a897456

Yes, _dump_cfg is also. And do you ever met : FileNotFoundError: [Errno 2] No such file or directory: 'data\libritts\code\19\train-clean-100#19#198#19_198_000004_000000.npy' in there: https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L187-L196

a897456 avatar Apr 14 '24 11:04 a897456

yes, also _dump_cfg.

yeah. @a897456 I changed self.cfg.preprocess.read_metadata: to False, and used acoustic_extractor to create this files:

            # code
            code = np.load(self.utt2code_path[utt])
            # frame_nums
            frame_nums = code.shape[1]
            # pitch
            pitch = np.load(self.utt2pitch_path[utt])
            # duration
            duration = np.load(self.utt2duration_path[utt])
            # phone_id
            phone_id = np.array(
                [
                    *map(
                        self.phone2id.get,
                        self.utt2phone[utt].replace("{", "").replace("}", "").split(),
                    )
                ]
            )

netagl avatar Apr 14 '24 11:04 netagl

yeah. @a897456 I changed self.cfg.preprocess.read_metadata: to False, and used acoustic_extractor to create this files:

used acoustic_extractor to create this files? How?

            # code
            code = np.load(self.utt2code_path[utt])
            # frame_nums
            frame_nums = code.shape[1]
            # pitch
            pitch = np.load(self.utt2pitch_path[utt])
            # duration
            duration = np.load(self.utt2duration_path[utt])
            # phone_id
            phone_id = np.array(
                [
                    *map(
                        self.phone2id.get,
                        self.utt2phone[utt].replace("{", "").replace("}", "").split(),
                    )
                ]
            )

This is the code in which part of if self.cfg.preprocess.read_metadata is false, so can you show the code of how use acoustic_extractor to create this files?

a897456 avatar Apr 15 '24 01:04 a897456

code:

        if cfg.preprocess.extract_acoustic_token:
            print('extract_acoustic_token')
            if cfg.preprocess.acoustic_token_extractor == "Encodec":
                codes = extract_encodec_token(wav_path)
                save_feature(
                    dataset_output, cfg.preprocess.acoustic_token_dir, uid, codes
                )

pitch:

        if cfg.preprocess.extract_pitch:
            pitch = f0.get_f0(wav, cfg.preprocess)
            save_feature(dataset_output, cfg.preprocess.pitch_dir, uid, pitch)

            if cfg.preprocess.extract_uv:
                assert isinstance(pitch, np.ndarray)
                uv = pitch != 0
                save_feature(dataset_output, cfg.preprocess.uv_dir, uid, uv)

phones:

from g2p_en import G2p
preprocess_english(res["Text"], lexicon, g2p)

@a897456

netagl avatar Apr 15 '24 10:04 netagl

THS,but :AttributeError: 'list' object has no attribute 'replace' https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L224-L230

a897456 avatar Apr 15 '24 11:04 a897456

Set True in cfg
@a897456

      assert cfg.preprocess.use_phone == True
        if cfg.preprocess.use_phone:
            self.utt2phone = {}
            for utt_info in self.metadata:
                dataset = utt_info["Dataset"]
                uid = utt_info["Uid"]
                utt = "{}_{}".format(dataset, uid)
                self.utt2phone[utt] = utt_info["phones"]

netaglazer avatar Apr 15 '24 13:04 netaglazer

Set True in cfg @a897456

      assert cfg.preprocess.use_phone == True
        if cfg.preprocess.use_phone:
            self.utt2phone = {}
            for utt_info in self.metadata:
                dataset = utt_info["Dataset"]
                uid = utt_info["Uid"]
                utt = "{}_{}".format(dataset, uid)
                self.utt2phone[utt] = utt_info["phones"]

Yes, and I changed the phone_id =... phone_id = np.array( [ *map( self.phone2id.get, self.utt2phone[utt].replace("{", "").replace("}", "").split(), ) ] )

a897456 avatar Apr 15 '24 13:04 a897456

https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L308-L313 in this code: phone_nums =len(phone_id)=len(tensor(1,X))=1, so phone_nums always=1 because: phone_id = torch.from_numpy(phone_id).unsqueeze(0) so clip_phone_nums=1 but assert clip_phone_nums < phone_nums and clip_phone_nums >= 1 How to solve it,please?

a897456 avatar Apr 15 '24 13:04 a897456

https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L308-L313 in this code: phone_nums =len(phone_id)=len(tensor(1,X))=1, so phone_nums always=1 because: phone_id = torch.from_numpy(phone_id).unsqueeze(0) so clip_phone_nums=1 but assert clip_phone_nums < phone_nums and clip_phone_nums >= 1 How to solve it,please?

a897456 avatar Apr 15 '24 13:04 a897456

Hi,@a897456 I meet the same problem and can't fix it,have you solved the problem? Any advice will be appreciated!!!

CreepJoye avatar May 27 '24 08:05 CreepJoye

Hi,@a897456 I meet the same problem and can't fix it,have you solved the problem? Any advice will be appreciated!!!

我在群里看到你问了,这个BUG作者应该还没修复。

a897456 avatar May 27 '24 08:05 a897456

Hi,@a897456 I meet the same problem and can't fix it,have you solved the problem? Any advice will be appreciated!!!

我在群里看到你问了,这个BUG作者应该还没修复。 方便的话可以在群里加个微信吗,想交流学习一下

CreepJoye avatar May 28 '24 07:05 CreepJoye

@CreepJoye and @a897456 Have you fix all these bugs?

chazo1994 avatar May 31 '24 16:05 chazo1994

Have you fix all these bugs? No, I made some changes but there are still some issues. I'm working on finding a solution. Do you have any thoughts?

CreepJoye avatar Jun 01 '24 08:06 CreepJoye

@CreepJoye and @a897456 Have you fix all these bugs?

@chazo1994 Have you fix all these bugs? I have been modifying the code, but new issues keep arising. If it's convenient, could we exchange contact information to discuss NS2 training?

CreepJoye avatar Jun 14 '24 13:06 CreepJoye

@CreepJoye Not yet, I have fixed a lot of bug, but there is still an error in the code extract (Encodec) which may not be implemented. I push my code in this fork:https://github.com/chazo1994/Amphion

You can contact me with my email [email protected] or my linkedin: https://www.linkedin.com/in/thinh-nguyen-a06658133/ or any platform that you used such as discord. I would be honored if we could discuss Neuralspeech2, Neuralspeech3 or any SOTA Speech generation model.

chazo1994 avatar Jun 17 '24 03:06 chazo1994