taming-transformers
taming-transformers copied to clipboard
inference time
trafficstars
are there ways to reduce inference time? Currently takes about 13 minutes on a k80 for the norway example at 432x288
Hi. Yes, one way is to store already calculated attention weights when creating a sequence. See for example https://huggingface.co/transformers/quickstart.html#using-the-past. Note that this is not currently implemented for our models as we wanted to stick to the very hackable minGPT implementation, but it would definitely be nice to look at.