midi-ddsp icon indicating copy to clipboard operation
midi-ddsp copied to clipboard

Can i test with custom data?

Open 589hero opened this issue 3 years ago • 6 comments

Hi, i am interested in this exciting project and i am trying to test this with our custom dataset and reproduce the format of original data. But there are some difficulties and questions below.

  1. Is there no way to use custom datasets at all?
  1. Is there any code to calculate elements of dataset below?
  • I want to know how to get "note_active_velocities", "note_active_frame_indices", "power_db", "note_onsets", "note_offsets" but there is no any code on repository.

Thank you for reading!

589hero avatar May 12 '22 03:05 589hero

Huge thanks for your interest in our work!

First, sorry that training on custom datasets is still hacky. For your question:

  1. Yes, you can refer to https://github.com/magenta/midi-ddsp/issues/46 for how to hack to train on your own dataset
  2. The code for the URMP dataloader is in https://github.com/magenta/ddsp/blob/main/ddsp/training/data.py#L495. "note_active_velocities", "note_active_frame_indices", "power_db" are not used in training MIDI-DDSP (you can search up the codebase and we do not use that key). So you can create a new dataloader class and exclude those keys.
  3. "note_onsets", "note_offsets" are the index of the frame where the note takes onset and offset. In the original tfrecord, it is binary and in shape [T_frame, 128], indicating which frame of which note is on/off. But in MIDI-DDSP, since we deal with monophonic playing, I convert it into a binary tensor in shape [T_frame], indicating which frame of the note is on/off. You can look into https://github.com/magenta/ddsp/blob/main/ddsp/training/data.py#L495 for more details. The final useful tensor is the one out of _reshape_tensors() and the keys used in MIDI-DDSP.

Hope this might help.

I probably will work on updating this codebase to support arbitrary dataset, but I don't know when exactly I will have that time.

lukewys avatar May 12 '22 13:05 lukewys

Thank you for reply in detail!

589hero avatar May 16 '22 07:05 589hero

Huge thanks for your interest in our work!

First, sorry that training on custom datasets is still hacky. For your question:

  1. Yes, you can refer to Technical limitations in processing arbitrary datasets? #46 for how to hack to train on your own dataset
  2. The code for the URMP dataloader is in https://github.com/magenta/ddsp/blob/main/ddsp/training/data.py#L495. "note_active_velocities", "note_active_frame_indices", "power_db" are not used in training MIDI-DDSP (you can search up the codebase and we do not use that key). So you can create a new dataloader class and exclude those keys.
  3. "note_onsets", "note_offsets" are the index of the frame where the note takes onset and offset. In the original tfrecord, it is binary and in shape [T_frame, 128], indicating which frame of which note is on/off. But in MIDI-DDSP, since we deal with monophonic playing, I convert it into a binary tensor in shape [T_frame], indicating which frame of the note is on/off. You can look into https://github.com/magenta/ddsp/blob/main/ddsp/training/data.py#L495 for more details. The final useful tensor is the one out of _reshape_tensors() and the keys used in MIDI-DDSP.

Hope this might help.

I probably will work on updating this codebase to support arbitrary dataset, but I don't know when exactly I will have that time.

Hi! I think "note_active_frame_indices" is actually used in model training because this feature is used to calculate data['midi'] (https://github.com/magenta/ddsp/blob/7cb3c37f96a3e5b4a2b7e94fdcc801bfd556021b/ddsp/training/data.py#L540) which is a necessary feature when training the synthesis generator. I have tried training a synthesis generator without "note_active_frame_indices" (more specifically, I assigned random numbers to data['midi'] when creating the dataset), and the resultant model doesn't work. Could you double check if "note_active_frame_indices" is needed? I was also wondering what this feature means, could you explain it? Thanks!

adagio715 avatar Dec 16 '22 11:12 adagio715

Hi, note_active_frame_indices is a binary tensor containing the onset information. The tensor is at shape [num_frame, 128], and if note_active_frame_indices[i,j] is 1, it means at i-th frame, the pitch j is on. By applying argmax to the -1 dimension, the note_active_frame_indices becomes data['midi'].

data['midi'] is a 1D integer tensor of shape [num_frames] where each item is the MIDI pitch number (integer) at that frame. Thus, MIDI-DDSP relies on data['midi'] for the MIDI input and to get the note boundary. It is crucial to train the model.

Hope this helps.

Best

lukewys avatar Dec 17 '22 16:12 lukewys

Hi, note_active_frame_indices is a binary tensor containing the onset information. The tensor is at shape [num_frame, 128], and if note_active_frame_indices[i,j] is 1, it means at i-th frame, the pitch j is on. By applying argmax to the -1 dimension, the note_active_frame_indices becomes data['midi'].

data['midi'] is a 1D integer tensor of shape [num_frames] where each item is the MIDI pitch number (integer) at that frame. Thus, MIDI-DDSP relies on data['midi'] for the MIDI input and to get the note boundary. It is crucial to train the model.

Hope this helps.

Best

Hi! Thanks a lot for your reply. This makes sense then. So the difference between note_active_frame_indices and note_onsets is that note_active_frame_indices indicates that the note "is being played" while note_onsets indicates that the note "starts". Did I get this correctly?

Just a little follow-up question about the content of note_active_frame_indices: From the provided urmp tfrecords, I checked the tensor values of this feature before it was reshaped into [num_frame]. I found that for each frame, the 128-d array contains 127 zeros and an integer value (something like 86, 87, 427, 428..), instead of 127 zeros and a "1". I don't think this affects the feature data['midi'], but I was wondering why those integer values instead of a simple "1" :)) Maybe you could explain if you know why?

Thanks!

adagio715 avatar Dec 18 '22 03:12 adagio715

I don't remember very clearly as a long time has passed. I think the value you are referring to is the velocity of that frame / note.

Best, Yusong

adagio715 @.***> 于2022年12月17日周六 22:57写道:

Hi, note_active_frame_indices is a binary tensor containing the onset information. The tensor is at shape [num_frame, 128], and if note_active_frame_indices[i,j] is 1, it means at i-th frame, the pitch j is on. By applying argmax to the -1 dimension, the note_active_frame_indices becomes data['midi'].

data['midi'] is a 1D integer tensor of shape [num_frames] where each item is the MIDI pitch number (integer) at that frame. Thus, MIDI-DDSP relies on data['midi'] for the MIDI input and to get the note boundary. It is crucial to train the model.

Hope this helps.

Best

Hi! Thanks a lot for your reply. This makes sense then. So the difference between note_active_frame_indices and note_onsets is that note_active_frame_indices indicates that the note "is being played" while note_onsets indicates that the note "starts". Did I get this correctly?

Just a little follow-up question about the content of note_active_frame_indices: From the provided urmp tfrecords, I checked the tensor values of this feature before it was reshaped into [num_frame]. I found that for each frame, the 128-d array contains 127 zeros and an integer value (something like 86, 87, 427, 428..), instead of 127 zeros and a "1". I don't think this affects the feature data['midi'], but I was wondering why those integer values instead of a simple "1" :)) Maybe you could explain if you know why?

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/magenta/midi-ddsp/issues/51#issuecomment-1356656708, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGXJZ7Z5Q2CDC46ABIDAJPDWN2DUBANCNFSM5VW34BUQ . You are receiving this because you commented.Message ID: @.***>

lukewys avatar Dec 18 '22 15:12 lukewys