RAD-NeRF icon indicating copy to clipboard operation
RAD-NeRF copied to clipboard

using another audio feature extraction

Open pegahs1993 opened this issue 2 years ago • 16 comments

During testing, I plan to use another audio feature extraction with a different shape (x, 16, 80). But it is incompatible with the convolution model.

RuntimeError: Given groups=1, weight of size [32, 44, 3], expected input[8, 80, 16] to have 44 channels, but got 80 channels instead

I change self.audio_in_dim in the directory ./nerf/network.py but it can not resolve the problem!

if 'esperanto' in self.opt.asr_model:
      self.audio_in_dim = 44

Is it possible to guide me which part must I change?

pegahs1993 avatar Feb 10 '23 09:02 pegahs1993

@tylersky1993 You could just fix audio_in_dim to 80 and remove the if condition? (assuming your asr_model's name doesn't contain 'esperanto').

ashawkey avatar Feb 10 '23 10:02 ashawkey

Thank you very much for responding so quickly I did that. but I have a same error again.

RuntimeError: Error(s) in loading state_dict for NeRFNetwork:
	size mismatch for audio_net.encoder_conv.0.weight: copying a param with shape torch.Size([32, 44, 3]) from checkpoint, the shape in current model is torch.Size([32, 80, 3]).

pegahs1993 avatar Feb 10 '23 10:02 pegahs1993

You'll have to train from scratch, instead of loading a pretrained model. You could delete the workspace and try again.

ashawkey avatar Feb 11 '23 02:02 ashawkey

Both wav2vec and deepspeech methods are used in the files required for training. But only the wav2vec method is used during the test. What is the reason behind this?

pegahs1993 avatar Feb 11 '23 08:02 pegahs1993

You can also use deepspeech in testing? Just specify --asr_model deepspeech and use the corresponding audio features.

ashawkey avatar Feb 11 '23 09:02 ashawkey

Thanks a lot @ashawkey !

pegahs1993 avatar Feb 11 '23 16:02 pegahs1993

If deepspeech is to be used in tests, what changes need to be made?

I change the default name and use the corresponding audio features. But it did not work!

parser.add_argument('--asr_model', type=str, default='deepspeech ')

It is written in Readme : if model is <ID>.pth, it uses deepspeech features

I use obama.pth but I have a this error: TypeError: object of type 'NoneType' has no len()

pegahs1993 avatar Feb 12 '23 11:02 pegahs1993

Could you provide the full error log?

ashawkey avatar Feb 12 '23 11:02 ashawkey

Could you provide the full error log?


Traceback (most recent call last):
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2484, in _get_value
    result = type_func(arg_string)
ValueError: invalid literal for int() with base 10: '{Pose_start}'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 1859, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2068, in _parse_known_args
    start_index = consume_optional(start_index)
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2008, in consume_optional
    take_action(action, args, option_string)
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 1920, in take_action
    argument_values = self._get_values(action, argument_strings)
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2469, in _get_values
    value = [self._get_value(action, v) for v in arg_strings]
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2469, in <listcomp>
    value = [self._get_value(action, v) for v in arg_strings]
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2497, in _get_value
    raise ArgumentError(action, msg % args)
argparse.ArgumentError: argument --data_range: invalid int value: '{Pose_start}'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\interactiveshell.py", line 2777, in safe_execfile
    py3compat.execfile(
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\site-packages\IPython\utils\py3compat.py", line 168, in execfile
    exec(compiler(f.read(), fname, 'exec'), glob, loc)
  File "D:\PhD\Imp\RAD-NeRF\test.py", line 110, in <module>
    opt = parser.parse_args()
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 1826, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 1862, in parse_known_args
    self.error(str(err))
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2583, in error
    self.exit(2, _('%(prog)s: error: %(message)s\n') % args)
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\argparse.py", line 2570, in exit
    _sys.exit(status)
SystemExit: 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py", line 1101, in get_records
    return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset)
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py", line 248, in wrapped
    return f(*args, **kwargs)
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py", line 281, in _fixed_getinnerframes
    records = fix_frame_records_filenames(inspect.getinnerframes(etb, context))
  File "C:\Users\---\anaconda3\envs\rad-nerf\lib\inspect.py", line 1670, in getinnerframes
    frameinfo = (tb.tb_frame,) + getframeinfo(tb, context)
AttributeError: 'tuple' object has no attribute 'tb_frame'
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\anaconda3\envs\rad-nerf\lib\argparse.py in _get_value(self, action, arg_string)
   2483         try:
-> 2484             result = type_func(arg_string)
   2485 

ValueError: invalid literal for int() with base 10: '{Pose_start}'

During handling of the above exception, another exception occurred:

ArgumentError                             Traceback (most recent call last)
~\anaconda3\envs\rad-nerf\lib\argparse.py in parse_known_args(self, args, namespace)
   1858             try:
-> 1859                 namespace, args = self._parse_known_args(args, namespace)
   1860             except ArgumentError:

~\anaconda3\envs\rad-nerf\lib\argparse.py in _parse_known_args(self, arg_strings, namespace)
   2067             # consume the next optional and any arguments for it
-> 2068             start_index = consume_optional(start_index)
   2069 

~\anaconda3\envs\rad-nerf\lib\argparse.py in consume_optional(start_index)
   2007             for action, args, option_string in action_tuples:
-> 2008                 take_action(action, args, option_string)
   2009             return stop

~\anaconda3\envs\rad-nerf\lib\argparse.py in take_action(action, argument_strings, option_string)
   1919             seen_actions.add(action)
-> 1920             argument_values = self._get_values(action, argument_strings)
   1921 

~\anaconda3\envs\rad-nerf\lib\argparse.py in _get_values(self, action, arg_strings)
   2468         else:
-> 2469             value = [self._get_value(action, v) for v in arg_strings]
   2470             for v in value:

~\anaconda3\envs\rad-nerf\lib\argparse.py in <listcomp>(.0)
   2468         else:
-> 2469             value = [self._get_value(action, v) for v in arg_strings]
   2470             for v in value:

~\anaconda3\envs\rad-nerf\lib\argparse.py in _get_value(self, action, arg_string)
   2496             msg = _('invalid %(type)s value: %(value)r')
-> 2497             raise ArgumentError(action, msg % args)
   2498 

ArgumentError: argument --data_range: invalid int value: '{Pose_start}'

During handling of the above exception, another exception occurred:

SystemExit                                Traceback (most recent call last)
~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\interactiveshell.py in safe_execfile(self, fname, exit_ignore, raise_exceptions, shell_futures, *where)
   2776                 glob, loc = (where + (None, ))[:2]
-> 2777                 py3compat.execfile(
   2778                     fname, glob, loc,

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\utils\py3compat.py in execfile(fname, glob, loc, compiler)
    167         compiler = compiler or compile
--> 168         exec(compiler(f.read(), fname, 'exec'), glob, loc)
    169 

D:\PhD\Imp\RAD-NeRF\test.py in <module>
    109 
--> 110     opt = parser.parse_args()
    111 

~\anaconda3\envs\rad-nerf\lib\argparse.py in parse_args(self, args, namespace)
   1825     def parse_args(self, args=None, namespace=None):
-> 1826         args, argv = self.parse_known_args(args, namespace)
   1827         if argv:

~\anaconda3\envs\rad-nerf\lib\argparse.py in parse_known_args(self, args, namespace)
   1861                 err = _sys.exc_info()[1]
-> 1862                 self.error(str(err))
   1863         else:

~\anaconda3\envs\rad-nerf\lib\argparse.py in error(self, message)
   2582         args = {'prog': self.prog, 'message': message}
-> 2583         self.exit(2, _('%(prog)s: error: %(message)s\n') % args)

~\anaconda3\envs\rad-nerf\lib\argparse.py in exit(self, status, message)
   2569             self._print_message(message, _sys.stderr)
-> 2570         _sys.exit(status)
   2571 

SystemExit: 2

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_10432\2001812536.py in <module>
      3 
      4 #@title Run Inference
----> 5 get_ipython().run_line_magic('run', 'test.py -O --torso      --pose data/pose.json      --data_range {Pose_start} {Pose_end}      --ckpt pretrained/model.pth      --aud data/sff.npy      --bg_img data/{BG}      --workspace trial')
      6 
      7 Video = get_latest_file(os.path.join('trial', 'results', '*.mp4'))

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
   2362                 kwargs['local_ns'] = self.get_local_scope(stack_depth)
   2363             with self.builtin_trap:
-> 2364                 result = fn(*args, **kwargs)
   2365             return result
   2366 

~\anaconda3\envs\rad-nerf\lib\site-packages\decorator.py in fun(*args, **kw)
    230             if not kwsyntax:
    231                 args, kw = fix(args, kw, sig)
--> 232             return caller(func, *(extras + args), **kw)
    233     fun.__name__ = func.__name__
    234     fun.__doc__ = func.__doc__

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\magics\execution.py in run(self, parameter_s, runner, file_finder)
    845                     else:
    846                         # regular execution
--> 847                         run()
    848 
    849             if 'i' in opts:

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\magics\execution.py in run()
    830 
    831                         def run():
--> 832                             runner(filename, prog_ns, prog_ns,
    833                                     exit_ignore=exit_ignore)
    834 

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\interactiveshell.py in safe_execfile(self, fname, exit_ignore, raise_exceptions, shell_futures, *where)
   2792                         raise
   2793                     if not exit_ignore:
-> 2794                         self.showtraceback(exception_only=True)
   2795             except:
   2796                 if raise_exceptions:

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\interactiveshell.py in showtraceback(self, exc_tuple, filename, tb_offset, exception_only, running_compiled_code)
   2068                     stb = ['An exception has occurred, use %tb to see '
   2069                            'the full traceback.\n']
-> 2070                     stb.extend(self.InteractiveTB.get_exception_only(etype,
   2071                                                                      value))
   2072                 else:

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in get_exception_only(self, etype, value)
    752         value : exception value
    753         """
--> 754         return ListTB.structured_traceback(self, etype, value)
    755 
    756     def show_exception_only(self, etype, evalue):

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in structured_traceback(self, etype, evalue, etb, tb_offset, context)
    627             chained_exceptions_tb_offset = 0
    628             out_list = (
--> 629                 self.structured_traceback(
    630                     etype, evalue, (etb, chained_exc_ids),
    631                     chained_exceptions_tb_offset, context)

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in structured_traceback(self, etype, value, tb, tb_offset, number_of_lines_of_context)
   1365         else:
   1366             self.tb = tb
-> 1367         return FormattedTB.structured_traceback(
   1368             self, etype, value, tb, tb_offset, number_of_lines_of_context)
   1369 

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in structured_traceback(self, etype, value, tb, tb_offset, number_of_lines_of_context)
   1265         if mode in self.verbose_modes:
   1266             # Verbose modes need a full traceback
-> 1267             return VerboseTB.structured_traceback(
   1268                 self, etype, value, tb, tb_offset, number_of_lines_of_context
   1269             )

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in structured_traceback(self, etype, evalue, etb, tb_offset, number_of_lines_of_context)
   1122         """Return a nice text document describing the traceback."""
   1123 
-> 1124         formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
   1125                                                                tb_offset)
   1126 

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in format_exception_as_a_whole(self, etype, evalue, etb, number_of_lines_of_context, tb_offset)
   1080 
   1081 
-> 1082         last_unique, recursion_repeat = find_recursion(orig_etype, evalue, records)
   1083 
   1084         frames = self.format_records(records, last_unique, recursion_repeat)

~\anaconda3\envs\rad-nerf\lib\site-packages\IPython\core\ultratb.py in find_recursion(etype, value, records)
    380     # first frame (from in to out) that looks different.
    381     if not is_recursion_error(etype, value, records):
--> 382         return len(records), 0
    383 
    384     # Select filename, lineno, func_name to track frames with

TypeError: object of type 'NoneType' has no len()

pegahs1993 avatar Feb 12 '23 12:02 pegahs1993

It says argument --data_range: invalid int value: '{Pose_start}', what's the command line you are running?

ashawkey avatar Feb 12 '23 23:02 ashawkey

It says argument --data_range: invalid int value: '{Pose_start}', what's the command line you are running?

%run test.py -O --torso \
    --pose data/pose.json \
    --data_range {Pose_start} {Pose_end} \
    --ckpt pretrained/model.pth \
    --aud data/speech.npy \
    --bg_img data/{BG} \
    --workspace trial

pegahs1993 avatar Feb 12 '23 23:02 pegahs1993

Oh, you should first define Pose_start and other {} variables. I guess you get this snippet from colab? You need to check those definitions, or use the full cmd from readme.

ashawkey avatar Feb 13 '23 01:02 ashawkey

Hi @ashawkey , Does the model.pth relate to deepspeech and wav2vec generated during training? According to my assumptions, the file named ngp is for deepspeech. A question. Don't we need to change the default --asr_model (main.py) to train each of them?

parser.add_argument('--asr_model', type=str, default='cpierse/wav2vec2-large-xlsr-53-esperanto')

pegahs1993 avatar Feb 19 '23 10:02 pegahs1993

No, they are downloaded automatically (from github or hugging face). Usually the esperanto model works well for most languages.

ashawkey avatar Feb 20 '23 01:02 ashawkey

Thanks a lot @ashawkey for your response!

The question is whether the trained model will be generated for both (wav2wec and deepspeech) during training, or should it be trained separately for each?

pegahs1993 avatar Feb 20 '23 08:02 pegahs1993

It should be trained seperately. e.g., the model trained on wav2vec can only use wav2vec features.

ashawkey avatar Feb 20 '23 08:02 ashawkey