Nathan Fradet comments

Results 52 comments of


                                            Nathan Fradet

trafficstars

Slow Performance of `tokenize_midi_dataset` Function

> I'll consider tokenizing on the fly, that would use PyTorch dataset mutiprocessing, right? If the tokenization is handled by the `Dataset` with a `DataLoader` yes! Now I just realised...

Slow Performance of `tokenize_midi_dataset` Function

I just realised that I mixed-up collator and data loader in my last comment. 😅 I'll put this on the account that it was late. The `DataLoader` has multiple workers...

Slow Performance of `tokenize_midi_dataset` Function

As I thought of how to implement a good `Dataset` class tokenizing MIDIs on the fly, I realised that splitting token sequences on the fly wouldn't be possible as what's...

Slow Performance of `tokenize_midi_dataset` Function

We could also do the MIDI splitting in the `Dataset` initialization, and save the MIDIs in a permanent directory (as 1.) with a config file, that would allow to not...

Slow Performance of `tokenize_midi_dataset` Function

@Kinyugo in #148 I added the `get_num_tokens_per_beat_distribution` and `get_num_beats_for_token_seq_len` methods that should somehow fulfil your problem of start/end segment, by finding a number of beats to split a MIDI into...

Slow Performance of `tokenize_midi_dataset` Function

> Do you mean that I won't have to pretrain the tokenizer before starting training? No I just meant that when training the tokenizer, the training data (MIDIs) is tokenized...

Slow Performance of `tokenize_midi_dataset` Function

> I am also not sure how we will teach the model to generate full samples. About full samples: I am currently experimenting with a `TSD` tokenizer, trained with BPE...

Slow Performance of `tokenize_midi_dataset` Function

> I now understand why splitting at midi level makes sense. In that case it might make sense to split dynamically during training that way we can also easily figure...

Slow Performance of `tokenize_midi_dataset` Function

Hi @Kinyugo 👋 I finally got some time to get back at the task :) I ended up making a "dynamic" splitting solution based on note note densities of each...

Slow Performance of `tokenize_midi_dataset` Function

Thank for taking the time to test it, and for reporting this bug! The errors comes from the `bi` index which exceeds the number of bars, I'm working on a...