inseq icon indicating copy to clipboard operation
inseq copied to clipboard

`CUDA out of memory` for larger datasets during attribution

Open lsickert opened this issue 1 year ago • 9 comments

🐛 Bug Report

When loading inseq with a larger dataset, on a CUDA device, an out-of-memory error is occurring regardless of the defined batch_size. I believe that is is caused by the call to self.encode inattribution_model.py lines 345 and 347, which is operating on the full inputs instead of a single batch and moves all inputs to the CUDA device after the encoding.

🔬 How To Reproduce

Steps to reproduce the behavior:

  1. Load any model without pre-generated targets
  2. Load a larger dataset with at least 1000 samples
  3. Call the .attribute() method with any batch_size parameter

Code sample

Environment

  • OS: macOS

  • Python version: 3.10

  • Inseq version: 0.4.0

Expected behavior

The input texts should ideally only be encoded or moved to the GPU once they are actually processed.

Additional context

lsickert avatar Jun 03 '23 16:06 lsickert

@gsarti Maybe you could try to confirm this behavior since I only seem to come across this when I run inseq on GPU.

lsickert avatar Jun 03 '23 16:06 lsickert

Hi @lsickert, could you confirm that you face this bug when installing main? There was a bug where batching was not applied to model.generate causing this same issue that was fixed recently. Let me know!

gsarti avatar Jun 03 '23 16:06 gsarti

Hi @lsickert, could you confirm that you face this bug when installing main? There was a bug where batching was not applied to model.generate causing this same issue that was fixed recently. Let me know!

You are correct. Since I can only reproduce this on CUDA (Habrok cluster), I did not have the version from main but the latest published version installed. It seems to be the same issue as #183 and is mostly fixed when I run it with the current version.

I do still encounter some issues, though, when I try to run my full dataset (1m samples) with it at once (which I did not have in previous experiments since I was running inseq in a batched loop). So I think the calls to self.encode() that already move all the inputs to the GPU might still be a bit of an issue with large datasets.

Maybe it makes sense to only move them to the GPU during the generate function and later on during the attribution function when they are batched?

lsickert avatar Jun 03 '23 17:06 lsickert

Documenting here another issue reported by @g8a9: when attributing a large set of examples at once it could happen that an example that is too large to fit the GPU memory will crash the entire attribution halfway.

A possible solution would be to have the option to sort inputs by length in terms of # of tokens in model.attribute (true by default). In this way, the top N (batch_size) longest examples will be attributed first, ensuring the process to either crash there or terminate later. Importantly, if sorting is activated, it should apply to all position-dependent fields (all parameters except step_scores where the argument is a list of strings)

gsarti avatar Jul 19 '23 12:07 gsarti

I'm not super keen on the idea of internal sorting by length as it could break some outer user-define logic. But I don't even know what's the best option here (maybe looking for the maximum length batch and running that first?)

Also, keep in mind that this dry run might take quite some time if sentences are very long -- so you might want not to repeat it if successful.

g8a9 avatar Jul 19 '23 13:07 g8a9

Yes, the idea of sorting by length was precisely aimed at avoiding the recomputation for the first big batch. As long as results are returned in the original order there shouldn't be a problem for user logics right?

gsarti avatar Jul 19 '23 13:07 gsarti

Yea if the original order is preserved I guess that works.

g8a9 avatar Jul 19 '23 13:07 g8a9

Might it also be an option to catch these errors and either skip the relevant example (with a logged warning, of course) or move just this one to the CPU for processing and then continue with the next batch on GPU?

lsickert avatar Jul 21 '23 16:07 lsickert

Moving to CPU seems the best option among the two, but I'm still not sure whether this should be preferable to raising an error at the start to signal that a lower batch size is required

gsarti avatar Jul 24 '23 07:07 gsarti