Hello, I am just curious, I have adapted test.py to do realtime inference on webcam, as well as the exact same modifications to https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix, and set it to do inference on CPU only, since my goal is to run it on a mobile phone.

While the memory usage of the compressed model is much smaller, only 5MB vs 200MB, inference time seems to actually be slower on the compressed model. I am getting 4 FPS on test_compressed.sh for edges2shoes_r, and in the original pytorch-CycleGAN-and-pix2pix code I am getting 8 FPS. Is this normal? I am sure the modifications I did in the inference code is the same, and that it looks like actual model latency.

Having 5MB instead of 200MB ram usage is a dramatic decrease, but is there also supposed to be a dramatic speedup?

Dec 27 '20 22:12 mpottinger

Update: One thing I missed was the Pytorch version. The other repo had a newer version. I upgraded to 1.7.1 for this one and inference speed is identical @ 8 FPS for compressed. So at least not slower, but no speed benefit. 40x memory use reduction though!

Dec 27 '20 22:12 mpottinger

Hi! I am wondering which model did you use? I think there should be some latency reduction as suggested in our paper. We've also released the code for measuring the latency in our repo. Could you double-check if there are some differences between your latency script and our latency script?

Dec 28 '20 02:12 lmxyy

I am simply timing the inference within test.py like this:

`start_time = time.time()

model.test() # run inference

print("test FPS: ", 1 / (time.time() - start_time)) # FPS `

I also overlooked that I was using my own custom model in the original Pytorch code, but since the model is significantly larger than the compressed edgestoshoes_r, I expected some speedup even between different models.

I also am finding Pix2Pix particularly difficult to convert to tflite or Pytorch mobile for inference on Android, so I might be forced to use it in Python on Android anyway, which can't be multithreaded to take advantage of the lower memory footprint.

It would be great on an embedded device such as a Jetson Nano, but unfortunately I am targeting Android only.

Dec 28 '20 15:12 mpottinger

@mpottinger upload the original and compressed models, along with some test images

Jan 01 '21 19:01 seekingdeep

Ok thank you. I am currently working more on my app that will make use of the models. I have figured out how to convert the model to mobile via onnx and ran inference successfully, so as I do more tests if I keep seeing the same results I will upload the models.

Jan 02 '21 03:01 mpottinger

How does the onnx model compares to the compressed, is it faster in inferencing? memory? size? Also, share with me your email so i can contact you, i am also working to utilize this repository.

Jan 02 '21 11:01 seekingdeep

How does the onnx model compares to the compressed, is it faster in inferencing? memory? size? Also, share with me your email so i can contact you, i am also working to utilize this repository.

onnx models are definitely faster than in Pytorch, that is due to being able to use frameworks optimized just for inference. Model size is about the same when converted.

I am able to use onnxruntime which is much faster for CPU inference, I have also successfully run inference in OpenCV dnn, which is slower but easy to implement on multiple platforms including Android.

I have also tried Alibaba MNN which is supposed to be very fast on mobile, speeds on mobile are comparable to CPU speeds on my fast desktop PC. 5-10 FPS with uncompressed models.

I have found that this issue can probably be closed. My mistake was comparing a custom trained model in one repo to the edges2shoes in the other repo. I thought inference time should be constant, but apparently not and it seems the trained model makes a difference?

I modified the jupyter notebook in this repo to do webcam inference on CPU only and comparing only the edges2shoes full & compressed in this repo. There I was able to see the speed difference on a live webcam stream.

Approx 3 FPS for the uncompressed full model, ~6 FPS for the compressed model. So I am assuming I will get a similar 2x speedup on my own custom models. So my initial comparison was flawed.

Here is the inference code I am using to test on the webcam:

`#!/usr/bin/env python

import os

import pickle import time import tqdm import cv2

import numpy as np import torch import torchvision.transforms as transforms

from utils.util import tensor2im

from models import create_model

Get our model

filename = 'opts/opt_compressed.pkl'

filename = 'opts/opt_full.pkl'

with open(filename, 'rb') as f: opt = pickle.load(f)

opt.gpu_ids = [] model = create_model(opt, verbose=False) model.setup(opt, verbose=False) model = model

transform_list = [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))] transform = transforms.Compose(transform_list) cap = cv2.VideoCapture(0) start_time = time.time() frameCount = 0 while True: ret, frame = cap.read() frameCount += 1 frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) frame = cv2.resize(frame,(256,256)) input = transform(frame).to('cpu:0') input = input.reshape([1, 3, 256, 256])

model_start = time.time()
output_ours = model.netG(input).cpu()
image_numpy = tensor2im(output_ours)
print("FPS: ", 1 / (time.time() - model_start))  # FPS = 1 / time to process loop
print("Avg FPS: ", frameCount / (time.time() - start_time))  # FPS = framecount / time to process loop
if len(image_numpy.shape) == 4:
    image_numpy = image_numpy[0]
if len(image_numpy.shape) == 2:
    image_numpy = np.expand_dims(image_numpy, axis=2)
if image_numpy.shape[2] == 1:
    image_numpy = np.repeat(image_numpy, 3, 2)
cv2.imshow("input", frame)
cv2.imshow("result", image_numpy)
cv2.waitKey(1)

`

Jan 03 '21 21:01 mpottinger

gan-compression
gan-compression copied to clipboard

Compressed model inference speed is slower than original pytorch pix2pix

Get our model

filename = 'opts/opt_full.pkl'

gan-compression gan-compression copied to clipboard

Compressed model inference speed is slower than original pytorch pix2pix

Get our model

filename = 'opts/opt_full.pkl'

gan-compression
gan-compression copied to clipboard