spotlight
spotlight copied to clipboard
CPU Inference After GPU Training
Hi - I trained an Implicit Sequence Model and loaded it in my Flask API for serving locally on my machine and I cannot seem to get CPU inference working.
The model works correctly when a GPU is available.
Steps to recreate:
-
Run flask server locally e.g. model = torch.load('./my_model_v0.13.pt', map_location='cpu')`
-
Post a JSON payload with sequence values. I've already tested that the server can correctly parse the response.
-
Server error when model attempts to predict
preds = model.predict(arr)
RuntimeError: torch.cuda.LongTensor is not enabled.
More trace below.
Traceback (most recent call last):
File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "main.py", line 77, in predict
preds = model.predict(arr)
File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/spotlight/sequence/implicit.py", line 323, in predict
sequence_var = gpu(sequences, self._use_cuda)
File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/spotlight/torch_utils.py", line 9, in gpu
return tensor.cuda()
RuntimeError: torch.cuda.LongTensor is not enabled.
def load_model():
"""Load the pre-trained model, you can use your model just as easily."""
global model
model = torch.load('./justlook_v0.13.pt', map_location='cpu')
You need to also turn the flag model._use_cuda
off. Otherwise the input will be converted to cuda tensors: sequence_var = gpu(sequences, self._use_cuda)
That's correct. There really should be a better way of doing this but I'm short on time and GPU testing runs.