TensorKart icon indicating copy to clipboard operation
TensorKart copied to clipboard

Robust Screenshots

Open kevinhughes27 opened this issue 7 years ago • 15 comments

The current screenshot engine is not very robust. It works on a recent install of Ubuntu or Linux Mint but numerous issues have been reported on other systems:

  • On MacOS people have reported the screenshots are always black: #15
  • Someone reported that the screenshots are repeated on Ubuntu 16.04 and an nvidia card #27
  • I opened an issue about exploring using GTK for screenshots #33 in hopes that it might be more robust and remove the wx dependency
  • Numerous requests for windows and cross platform support which does not work with the current setup as far as I know

SerpentAI could be a good reference for the way to do this. They have a nice more fully fleshed out architecture for screen capture and claim to be cross platform:

For the actual screenshots they use mss

kevinhughes27 avatar Sep 28 '17 17:09 kevinhughes27

FYI, I just tested grabbing screenshots using wxPython versus mss. My machine is ancient and relatively slow, but the comparison between the two should be somewhat valid. I am running Ubuntu 16.04.

For 10,000 iterations, grabbing a screen of 640x480,

     wx: 178.41421008110046 seconds
    mss: 153.49231600761414 seconds

mss shaved about 16% off the time taken compared to wx. Significant or not could be up for debate, but still an improvement is an improvement.

Besides just the screenshots, wx is currently being used for the record.py GUI, which as we've discussed, may not needed(#30). If we could eliminate it, the bigger impact here in my opinion is the size of that dependency. Looking in my /usr/local/lib/python2.7/dist-packages directory, it looks like wx takes up ~210M, whereas mss only uses ~124K. Now that's significant! Plus, as we've both seen, wx can be a pain to get installed and working correctly (#41, #52, gym #18, etc).

Not only that, but the code for mss is only a couple lines long. wx requires more lines to accomplish the same thing (returning a numpy array). It may be worth noting that mss is returning 4 channels by default, presumably with the 4th being an alpha channel. Haven't looked into tweaking that yet, or exactly what it would impact in our projects. It may be a simple switch to turn off, or it could be simple to drop with numpy.

Here is my test:


import timeit
import numpy as np

# Setup WX test:
import wx, array
wxApp = wx.App()
wxScreen = wx.ScreenDC()
pixel_array = array.array('B', [0] * (640 * 480 * 3))
def wxShot():
  bmp = wx.Bitmap(640, 480)
  wx.MemoryDC(bmp).Blit(0, 0, 640, 480, wxScreen, 0, 0)
  bmp.CopyToBuffer(pixel_array)
  numpy_array = np.frombuffer(pixel_array, dtype=np.uint8)
  numpy_array = numpy_array.reshape(480, 640, 3)
  return numpy_array

# Setup MSS test:
import mss
mssGrabber = mss.mss()
def mssShot():
  mssArray = np.array(mssGrabber.grab({"top":0,"left":0,"width":640,"height":480}), dtype=np.uint8)
  return mssArray

# Run tests:
wxDuration = timeit.timeit(wxShot, number=10000); 
mssDuration = timeit.timeit(mssShot, number=10000);
print "wx Duration:", wxDuration
print "mss Duration:", mssDuration

It would be good to test this on other machines and operating systems as well, to confirm the cross-platform support, as well as checking performance. If you get the chance to try the script, let me know what your results look like.

bzier avatar Oct 01 '17 03:10 bzier

Turns out that mss has the color channels as BGRA, whereas we've been using RGB. This may be usable, or if anyone knows a quick transformation (as in computationally efficient) we could attempt to convert it.

bzier avatar Oct 01 '17 17:10 bzier

Well if I would read the documentation, it turns out there is a property on the mss screenshot object called pixels that is the list of RGB tuples. However, it is extremely slow (still waiting). If we want to use it instead of wx I think it would make sense to adjust to handle the 'natural' BGRA format, than attempt to use the pixels property. Again, unless someone has a quick numpy conversion from BGRA->RGB.

bzier avatar Oct 01 '17 18:10 bzier

So 10,000 passes ran for close to an hour before I killed it. Just ran each through 100 times, so I could actually see a rough comparison. The average could be skewed with such a small sample due to other processes on my machine, etc. However, the results:

wx Duration: 4.0597550869
mss Duration: 61.0876660347

Extrapolating to 10,000, it looks like it could have taken ~1h 41m... Ok, that's all I've got for now. I'll stop spamming this issue.

bzier avatar Oct 01 '17 19:10 bzier

I totally agree - the real win here is dropping a heavy dependency.

I think we'd be fine to just use the BGRA array as is. It is interesting that its slower to grab the RGB pixels but I guess they weren't worried about performance for that method. I know OpenCV has fast color conversion methods but we probably don't want to add it as a dependency. If someone gets around to this before me I think we should look at scikit-image for the conversion if its really required.

kevinhughes27 avatar Oct 02 '17 05:10 kevinhughes27

Yeah, I saw OpenCV as an option for doing that conversion, but I also didn't think it was worth taking on that dependency simply for that. It seems like the model could just be updated with that added dimension. And any trained agents/models would obviously become invalid as a result. Although it would be interesting to see how an agent trained on RGB would perform when fed BGR data 😉(don't think it would be compatible with the added dimension of the alpha channel, but reversed colors could be interesting).

bzier avatar Oct 02 '17 19:10 bzier

Hello,

Author of the MSS module here :)

I was wondering if I could help. I tried some benchmark using MSS and Numpy (not wx), and I think you can find some clues:

  • To convert Numpy array from BGRA to RGB, I think this may work: np.array(mssGrabber.grab({...}), dtype=numpy.uint8)[..., [2, 1, 0]]. It is quite fast.
  • Do not use mss.pixels but eventually mss.rgb, but it is a little slower than the line below.

Also, I am open to suggestions to improve the module, I would be glad to help :)

BoboTiG avatar Dec 06 '17 22:12 BoboTiG

@BoboTiG, That's great, thank you for reaching out to us. What I ended up doing for the BGRA to RGB conversion was this:

# drop the alpha channel and flip red and blue channels (BGRA -> RGB)
        self.numpy_array = \
            np.flip(image_array[:, :, :3], 2)

In the next few days I'll attempt to compare the method you suggested with what I've done. It is working fairly well at this point, but the more performance I can squeeze out, the better. When I get around to it, I'll be sure to post the results here.

bzier avatar Dec 07 '17 04:12 bzier

I realized that I never did this comparison a few months ago, but it looks like you've already captured it here.

For whatever it is worth, I did finally do the comparison using 3.2.0 and found that for 10,000 iterations on my machine, the slice method took 103.99s whereas the flip method took 85.93s.

bzier avatar Mar 23 '18 01:03 bzier

@kevinhughes27 What are your thoughts around this issue now? I'm a little hesitant to close it without verifying that the changes truly resolve the problems described in the initial comment. Having never experienced #15 or #27 I'm not sure how to prove they will work now. Also, the last point is about cross-platform support, which I still haven't tested either.

bzier avatar Mar 23 '18 01:03 bzier

Could you share the benchmark script? FTR you can find some benchmark results of different methods to convert BGRA to RGB here: https://github.com/BoboTiG/python-mss/blob/master/tests/bench_bgra2rgb.py ;) (of course, if you have a different approach, I am open to extend the script)

BoboTiG avatar Mar 23 '18 09:03 BoboTiG

I just tried with the previous snippet you shared, replaced wx.BitMap(640,480) with wx.EmptyBitmap(640, 480) and I have those results (10,000 iterations, Ubuntu 16.04, good machine):

wx Duration: 29.0888140202
mss Duration: 5.21338796616

I needed to use wx.EmptyBitmap() because I got this error:

	bmp = wx.Bitmap(640, 480)
  File "/usr/lib/python2.7/dist-packages/wx-3.0-gtk2/wx/_gdi.py", line 639, in __init__
	_gdi_.Bitmap_swiginit(self,_gdi_.new_Bitmap(*args, **kwargs))
TypeError: String or Unicode type required

BoboTiG avatar Mar 23 '18 09:03 BoboTiG

@BoboTiG I realized my numbers above included the actual screen capture in the timings, so it wasn't just the transformation of BGRA->RGB. However, the capture was identical for both, so the timing difference was just due to the transformation.

I updated it now to just capture the image up-front and then run the transformation in the timings. Now the results (again for 10,000 iterations): the slice method took 25.64s; the flip method took 7.93s. This again shows a 17-18s difference between the two as it had before.

Here's the script I used (after updating to pull the image grab up-front):

import timeit
import numpy
import mss

sct = mss.mss()
im = sct.grab({"top":0,"left":0,"width":640,"height":480})

def numpy_slice():
    return numpy.array(im, dtype=numpy.uint8)[..., [2, 1, 0]]

def numpy_flip():
    return numpy.flip(numpy.array(im)[:, :, :3], 2)

timeit.timeit(numpy_slice, number=10000)
timeit.timeit(numpy_flip, number=10000)

It is worth mentioning that I am interested in the numpy array, not the actual bytes. I noticed that the tobytes() method adds significant overhead. Here's a comparison of with/without:

import timeit
import numpy
import mss

sct = mss.mss()
im = sct.grab({"top":0,"left":0,"width":640,"height":480})

def numpy_flip():
    return numpy.flip(numpy.array(im)[:, :, :3], 2)

def numpy_flip_tobytes():
    return numpy.flip(numpy.array(im)[:, :, :3], 2).tobytes()

timeit.timeit(numpy_flip, number=10000)
timeit.timeit(numpy_flip_tobytes, number=10000)

## Results:
#   8.847s
# 127.527s

bzier avatar Mar 23 '18 20:03 bzier

OK, without .tobytes(), the Numpy flip method is incredibly quick :open_mouth: I will update the documentation to reflect that. Thanks :)

BoboTiG avatar Mar 23 '18 20:03 BoboTiG

Also, since the PIL option was so fast, I tried converting it to a numpy array, but it is slow:

numpy.array(Image.frombytes('RGB', im.size, im.bgra, 'raw', 'BGRX'))

For now, given my need for the numpy array, I'll stick with the flip version :) I think having the different options in there is worthwhile, since people may have different needs (for the bytes, for a numpy array, rgb or bgra, etc).

bzier avatar Mar 23 '18 20:03 bzier