deallocate must not throw
Std containers call deallocate from their allocator during their destructor. A conforming implementation must call std::terminate if a destructor would exit by throwing a new exception. So one cannot handle an error from freeing the CUDA memory by throwing anything, because it will not be caught in time. Normal practice is comment that such an error cannot be handled in a way that would make it safe to use in a generic way.
Thanks for the feedback. The issue is that the CUDA error coming out of cudaFree must be reported somehow, and throwing an exception containing the error string is one way to do this kind of reporting. It's never been clear to me why immediately calling std::terminate is preferable to propagating the error via exception, given that containers are free to catch such an exception at the point at which they call deallocate.
Why must it be reported? The design of its intended client (std containers) does not permit propagation of an error to the code of its client. That's partly because libc free() reports no errors. It's an error to use a std container from code that cannot tolerate a non reported error in deallocation. An allocator that does not conform to that deallocation behaviour risks violating the normal expectations that coders have, ie that letting a std::vector go out of scope will not halt the program.
The error must be reported because it's an enormous productivity sink to have to track down silently ignored CUDA errors.
In practice, the difference between immediately calling std::terminate inside deallocate versus waiting until the exception eventually causes the system to call .what() + std::terminate seems small to me. If you'd like to submit a pull request to replace the throw with a printf + std::terminate, I'll merge it.
I'm not recommending not catching cuda errors in general. I'm recommending not using a std container with an allocator that does not conform to expected behaviour. If handling errors from cudaFree is a productivity gain, then one needs a container that has a free() method that must be called before destruction. Now the caller can make a sensible choice after catching from that free(), rather than the allocator forcing the behaviour.