gan-compression
gan-compression copied to clipboard
Compressed model inference speed is slower than original pytorch pix2pix
Hello, I am just curious, I have adapted test.py to do realtime inference on webcam, as well as the exact same modifications to https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix, and set it to do inference on CPU only, since my goal is to run it on a mobile phone.
While the memory usage of the compressed model is much smaller, only 5MB vs 200MB, inference time seems to actually be slower on the compressed model. I am getting 4 FPS on test_compressed.sh for edges2shoes_r, and in the original pytorch-CycleGAN-and-pix2pix code I am getting 8 FPS. Is this normal? I am sure the modifications I did in the inference code is the same, and that it looks like actual model latency.
Having 5MB instead of 200MB ram usage is a dramatic decrease, but is there also supposed to be a dramatic speedup?
Update: One thing I missed was the Pytorch version. The other repo had a newer version. I upgraded to 1.7.1 for this one and inference speed is identical @ 8 FPS for compressed. So at least not slower, but no speed benefit. 40x memory use reduction though!
Hi! I am wondering which model did you use? I think there should be some latency reduction as suggested in our paper. We've also released the code for measuring the latency in our repo. Could you double-check if there are some differences between your latency script and our latency script?
I am simply timing the inference within test.py like this:
`start_time = time.time()
model.test() # run inference
print("test FPS: ", 1 / (time.time() - start_time)) # FPS `
I also overlooked that I was using my own custom model in the original Pytorch code, but since the model is significantly larger than the compressed edgestoshoes_r, I expected some speedup even between different models.
I also am finding Pix2Pix particularly difficult to convert to tflite or Pytorch mobile for inference on Android, so I might be forced to use it in Python on Android anyway, which can't be multithreaded to take advantage of the lower memory footprint.
It would be great on an embedded device such as a Jetson Nano, but unfortunately I am targeting Android only.
@mpottinger upload the original and compressed models, along with some test images
Ok thank you. I am currently working more on my app that will make use of the models. I have figured out how to convert the model to mobile via onnx and ran inference successfully, so as I do more tests if I keep seeing the same results I will upload the models.
How does the onnx model compares to the compressed, is it faster in inferencing? memory? size? Also, share with me your email so i can contact you, i am also working to utilize this repository.
How does the onnx model compares to the compressed, is it faster in inferencing? memory? size? Also, share with me your email so i can contact you, i am also working to utilize this repository.
onnx models are definitely faster than in Pytorch, that is due to being able to use frameworks optimized just for inference. Model size is about the same when converted.
I am able to use onnxruntime which is much faster for CPU inference, I have also successfully run inference in OpenCV dnn, which is slower but easy to implement on multiple platforms including Android.
I have also tried Alibaba MNN which is supposed to be very fast on mobile, speeds on mobile are comparable to CPU speeds on my fast desktop PC. 5-10 FPS with uncompressed models.
I have found that this issue can probably be closed. My mistake was comparing a custom trained model in one repo to the edges2shoes in the other repo. I thought inference time should be constant, but apparently not and it seems the trained model makes a difference?
I modified the jupyter notebook in this repo to do webcam inference on CPU only and comparing only the edges2shoes full & compressed in this repo. There I was able to see the speed difference on a live webcam stream.
Approx 3 FPS for the uncompressed full model, ~6 FPS for the compressed model. So I am assuming I will get a similar 2x speedup on my own custom models. So my initial comparison was flawed.
Here is the inference code I am using to test on the webcam:
`#!/usr/bin/env python
import os
import pickle import time import tqdm import cv2
import numpy as np import torch import torchvision.transforms as transforms
from utils.util import tensor2im
from models import create_model
Get our model
filename = 'opts/opt_compressed.pkl'
filename = 'opts/opt_full.pkl'
with open(filename, 'rb') as f: opt = pickle.load(f)
opt.gpu_ids = [] model = create_model(opt, verbose=False) model.setup(opt, verbose=False) model = model
transform_list = [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))] transform = transforms.Compose(transform_list) cap = cv2.VideoCapture(0) start_time = time.time() frameCount = 0 while True: ret, frame = cap.read() frameCount += 1 frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) frame = cv2.resize(frame,(256,256)) input = transform(frame).to('cpu:0') input = input.reshape([1, 3, 256, 256])
model_start = time.time()
output_ours = model.netG(input).cpu()
image_numpy = tensor2im(output_ours)
print("FPS: ", 1 / (time.time() - model_start)) # FPS = 1 / time to process loop
print("Avg FPS: ", frameCount / (time.time() - start_time)) # FPS = framecount / time to process loop
if len(image_numpy.shape) == 4:
image_numpy = image_numpy[0]
if len(image_numpy.shape) == 2:
image_numpy = np.expand_dims(image_numpy, axis=2)
if image_numpy.shape[2] == 1:
image_numpy = np.repeat(image_numpy, 3, 2)
cv2.imshow("input", frame)
cv2.imshow("result", image_numpy)
cv2.waitKey(1)
`