avalanche icon indicating copy to clipboard operation
avalanche copied to clipboard

Multiple buffer resize calls made in ParametricBuffer

Open shekkizh opened this issue 3 years ago • 4 comments

Hi, I'm working with ReplayBuffers and noticed that these methods are extremely slow. On digging into the code I noticed that the update function in this class calls resize 3 times for a single update, and since this involves going through the entire data, it ends up with a much longer runtime. The fix should be fairly simple.

Relevant Code snippets: lines 355 - 373 of storage_policy.py

 # update buffers with new data
for group_id, new_data_g in new_groups.items():
    ll = group_to_len[group_id]
    if group_id in self.buffer_groups:
        old_buffer_g = self.buffer_groups[group_id]
        old_buffer_g.update_from_dataset(strategy, new_data_g)  ### Call 1
        old_buffer_g.resize(strategy, ll)  ### Call 2
    else:
        new_buffer = _ParametricSingleBuffer(
            ll, self.selection_strategy
        )
        new_buffer.update_from_dataset(strategy, new_data_g)
        self.buffer_groups[group_id] = new_buffer

# resize buffers
for group_id, class_buf in self.buffer_groups.items():
    self.buffer_groups[group_id].resize( 
        strategy, group_to_len[group_id]
    ) ### Call 3

update_from_dataset function calls resize within its call - lines 440-442

def update_from_dataset(self, strategy, new_data):
        self.buffer = AvalancheConcatDataset([self.buffer, new_data])
        self.resize(strategy, self.max_size)

shekkizh avatar Feb 21 '22 19:02 shekkizh

Can you explain what is wrong here? Given a fixed memory size, each time that you add new samples you also have to resize the buffer. If your memory is split into groups, you have to resize each group.

since this involves going through the entire data

This is incorrect. Concatenating datasets does not require iterating over the original data.

Can you give more details about the runtime? Did you measure it? compare it against something? Are you sure that it's the resize operation to be slow and not something else (maybe dataloading)?

AntonioCarta avatar Feb 22 '22 10:02 AntonioCarta

@shekkizh thanks for reporting the issue. Are you using the Beta version (0.1.0) of Avalanche? If yes, I guess the "extremely slow" running time is due to the way replay data loaders are created for each buffer task. The replay plugin has changed since the beta release. Now you can set task_balanced_dataloader = False in ReplayPlugin to avoid "long" parallel replay data loaders when the number of tasks increases.

This is something that I realized before in task-incremental scenarios. To check if this is the reason, the number of training iterations per epoch should increase with every new task.

HamedHemati avatar Feb 22 '22 10:02 HamedHemati

@AntonioCarta - The issue I was describing is that the resize operation for a single addition of new samples is done thrice - as pointed out in the code snippet. Ideally, one would resize only once (per group or otherwise) after a sample is added. I think this needs to be updated irrespective of runtime considerations.

The update operation is related to concatenation but resize operation does a sort (based on a selection strategy) and selects indices going through all data points in the buffer. My initial assumption was that this was the reason for the slow runtime of the CL strategy but I think it might be something else as Hamed mentions. My comparisons are with respect to other strategies available in the library (I understand this might not be a fair comparison) - replay strategy takes 4-5x more runtime than iCARL or AGEM method (tested on a single p100 GPU for a 10 experience problem with permutedMNIST).

@HamedHemati Yes, I'm using the beta version. I'll see if the updated plugin helps with runtime. Thanks.

shekkizh avatar Feb 22 '22 21:02 shekkizh

I understands now. You are correct and we can probably avoid this and do the resize only once (call 3). It shouldn't make a large difference in runtime but it's still a useless operation (if you have an expensive selection strategy).

AntonioCarta avatar Feb 23 '22 11:02 AntonioCarta