inseq
inseq copied to clipboard
`CUDA out of memory` for larger datasets during attribution
🐛 Bug Report
When loading inseq with a larger dataset, on a CUDA device, an out-of-memory error is occurring regardless of the defined batch_size
. I believe that is is caused by the call to self.encode
inattribution_model.py
lines 345 and 347, which is operating on the full inputs instead of a single batch and moves all inputs to the CUDA device after the encoding.
🔬 How To Reproduce
Steps to reproduce the behavior:
- Load any model without pre-generated targets
- Load a larger dataset with at least 1000 samples
- Call the
.attribute()
method with anybatch_size
parameter
Code sample
Environment
-
OS: macOS
-
Python version: 3.10
-
Inseq version: 0.4.0
Expected behavior
The input texts should ideally only be encoded or moved to the GPU once they are actually processed.
Additional context
@gsarti Maybe you could try to confirm this behavior since I only seem to come across this when I run inseq on GPU.
Hi @lsickert, could you confirm that you face this bug when installing main
? There was a bug where batching was not applied to model.generate
causing this same issue that was fixed recently. Let me know!
Hi @lsickert, could you confirm that you face this bug when installing
main
? There was a bug where batching was not applied tomodel.generate
causing this same issue that was fixed recently. Let me know!
You are correct. Since I can only reproduce this on CUDA (Habrok cluster), I did not have the version from main
but the latest published version installed. It seems to be the same issue as #183 and is mostly fixed when I run it with the current version.
I do still encounter some issues, though, when I try to run my full dataset (1m samples) with it at once (which I did not have in previous experiments since I was running inseq in a batched loop). So I think the calls to self.encode()
that already move all the inputs to the GPU might still be a bit of an issue with large datasets.
Maybe it makes sense to only move them to the GPU during the generate
function and later on during the attribution function when they are batched?
Documenting here another issue reported by @g8a9: when attributing a large set of examples at once it could happen that an example that is too large to fit the GPU memory will crash the entire attribution halfway.
A possible solution would be to have the option to sort inputs by length in terms of # of tokens in model.attribute
(true
by default). In this way, the top N
(batch_size) longest examples will be attributed first, ensuring the process to either crash there or terminate later. Importantly, if sorting is activated, it should apply to all position-dependent fields (all parameters except step_scores
where the argument is a list of strings)
I'm not super keen on the idea of internal sorting by length as it could break some outer user-define logic. But I don't even know what's the best option here (maybe looking for the maximum length batch and running that first?)
Also, keep in mind that this dry run might take quite some time if sentences are very long -- so you might want not to repeat it if successful.
Yes, the idea of sorting by length was precisely aimed at avoiding the recomputation for the first big batch. As long as results are returned in the original order there shouldn't be a problem for user logics right?
Yea if the original order is preserved I guess that works.
Might it also be an option to catch these errors and either skip the relevant example (with a logged warning, of course) or move just this one to the CPU for processing and then continue with the next batch on GPU?
Moving to CPU seems the best option among the two, but I'm still not sure whether this should be preferable to raising an error at the start to signal that a lower batch size is required