tensorflow-ctc-speech-recognition batch generation

batch generation

Open mattc-eostar opened this issue 6 years ago • 1 comments

Curious as to why you didn't use a generator for batch creation and then also capped the number of iterations per epoch to 10. I updated the code to this:

def batch_generator(items, batch_size=16):
    items = np.array(items)
    idx = 0
    while True:
        if idx+batch_size > len(items):
            np.random.shuffle(items)
            idx = 0

        items_batch = dict(items[idx:idx+batch_size])
        
        x_batch = []
        y_batch = []
        seq_len_batch = []
        original_batch = []
        for k,v in items_batch.items():
            target_text = v['target']
            audio_buffer = v['audio']
            x, y, seq_len, original = convert_inputs_to_ctc_format(audio_buffer, sample_rate, target_text, num_features)
            
            x_batch.append(x)
            y_batch.append(y)
            seq_len_batch.append(seq_len)
            original_batch.append(original)

        y_batch = sparse_tuple_from(y_batch)
        seq_len_batch = np.array(seq_len_batch)[:, 0]
        for i, pad in enumerate(np.max(seq_len_batch) - seq_len_batch):
            x_batch[i] = np.pad(x_batch[i], ((0, 0), (0, pad), (0, 0)), mode='constant', constant_values=0)

        x_batch = np.concatenate(x_batch, axis=0)
        
        idx += batch_size
        
        yield x_batch, y_batch, seq_len_batch, original_batch

Then we can use it like this

data = list(audio.cache.items()) # could shuffle here as well
split = int(len(data)*0.8)

train_data = data[:split]
valid_data = data[split:]

num_batches_per_epoch = len(train_data) // batch_size

train_gen = batch_generator(train_data, batch_size)
valid_gen = batch_generator(valid_data, batch_size)

#...training code....

for batch_num in range(num_batches_per_epoch):
    train_inputs, train_targets, train_seq_len, original = next(train_gen)
    feed = {inputs: train_inputs,
                 targets: train_targets,
                 seq_len: train_seq_len,
                 keep_prob: 0.8}

This will be sure to cycle through the entire dataset while also shuffling it each cycle. It also allows the batch generator to be agnostic, which in my opinion is good. Just thought I would leave this here unsolicited.

Also, this is working just fine on tensorflow==1.9.0.

Thanks for building out this architecture!

Jul 31 '19 19:07 mattc-eostar

@mattc-eostar thanks a lot for this comment. If you could just work out a quick PR, it would be very useful. There's no reason on why I haven't used a generator. I just wanted to keep the code as simple as possible. But your code looks better!

Nov 09 '19 08:11 philipperemy

tensorflow-ctc-speech-recognition tensorflow-ctc-speech-recognition copied to clipboard

batch generation

tensorflow-ctc-speech-recognition
tensorflow-ctc-speech-recognition copied to clipboard