File Descriptor Leak
I originally opened a File Descriptor leak, and then closed it, but looks like I should have left it open. When instantiating a ghost object, and then looping through a list of URLs, and having that object grab each url + screenshot, Ghost uses up file descriptors with the network requests, and never, or VERY SLOWLY closes them.
pseudo code: ghost_object = ghost.Ghost(wait_timeout=int(url_timeout), ignore_ssl_errors=True) for url in url_list: page, extra_resources = ghost_object.open(url) ghost_object.capture_to('header.png', selector="header")
Is there a way to close the network requests, and destroy the object, or anything that would close the "handles" to the file descriptors?
Updated the pseudocode - but would love to know this...
Also seeing this issue in a similar scenario; in my case, I only invoke Ghost.open once, and then use Javascript to navigate via pushState. After loading a couple hundred hundred URLs in this way (each of which might load images, AJAX requests, etc.) I reach the limit. Dropping the reference to Ghost on the floor every so often and recreating didn't seem to help.
In the end I increased the number of open files the user running the job could have; an actual solution or better workaround would be great!
I also encountered this Problem. My Code is:
from ghost import Ghost
ghost = Ghost()
while True:
page, extra_resources = ghost.open("http://www.mobotix.com", wait=True)
print page
While executing this code, I watch the opened file descriptors. It raises up to 1024. Then Ghost.py stops working. I watch the open fds with: "ls -1 /proc/pid/fd"
I saw that deactivating the "Ghost.py" Cache (constructor using "cache_dir=None") solves the fd leak. But then, I loose access on the resources :(
The root cause for the problem may be: https://bugreports.qt-project.org/browse/QTBUG-36076
From my point of view, issue #175 is a duplicate of this issue.
BUT: I need some workaround....
Good catch! I bet that is exactly the reason for the crash. Really would be sweet if it could get fixed! Anytime I use ghost with a fairly large number of URLs, it's been a significant problem, and has caused ghost (the script invoking it) to just crash
Why did you close this issue? We should leave it open, that someone searches for a solution :)
How can we reopen it? I can't find the button for this !??
Ah, didn't realize I closed this. Just meant to comment, not close and comment :)
Yes, we would definitely love to have this fixed! :)
There any way we could potentially get a status on this? This is imo, fairly critical as this bug can cause ghost (and the process using it) to just crash.
(I'm using Ghost to screenshot websites.)
I simply set cache_dir=None. I haven't tested in my VPS, but in my development PC (Windows) the picture appear as good as they used to.
I'm still struggling with Issue#164 though.
Does that actually stop open file descriptors?
@Vaskivo I just tested setting cache_dir to None, and I can still see the list of open file descriptors rising really quickly
I'll test it better then. I ran it and checked if the pictures appeared the same in my pc. Then used the code in the VPS and stopped getting seg faults and "Too many open files" errors.
@ChrisTruncer @Vaskivo This should have been fixed in there https://github.com/jeanphix/Ghost.py/commit/6aa5674e921c6398582a9e71eac8de5e48911c55
Not released yet, but feel free to try the refacts branch.
@jeanphix I just cloned and installed the latest master build to test out the fix. I can confirm that the file descriptor leak is still present. The number of descriptors is consistently rising and eventually hits the limit causing ghost to crash.
Like stated before in this issue thread, this is likely due to this leak - https://bugreports.qt-project.org/browse/QTBUG-36076
Unfortunately, it severely impacts ghost as it causes it to crash once it hits the file descriptor limit. I'm also running into a different issue with the new code base, but I will try to open a separate issue for it.
@ChrisTruncer can't reproduce it... just ran this little script:
from ghost import Ghost
g = Ghost()
for i in range(0, 200):
g.open('http://www.apple.com')
g.capture_to('apple/%s.png' % i)
no descriptors leak...
How it looks at 51th iteration:
➜ Ghost.py git:(dev) ✗ ls -als /proc/18277/fd
total 0
0 dr-x------ 2 jeanphix jeanphix 0 Nov 17 19:46 .
0 dr-xr-xr-x 8 jeanphix jeanphix 0 Nov 17 19:46 ..
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 0 -> /dev/pts/0
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 1 -> /dev/pts/0
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 10 -> socket:[138338]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 11 -> anon_inode:[eventfd]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 12 -> anon_inode:[eventfd]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 13 -> socket:[138339]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 14 -> anon_inode:[eventfd]
0 l-wx------ 1 jeanphix jeanphix 64 Nov 17 19:46 15 -> /home/jeanphix/projects/Ghost.py/apple/50.png
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 16 -> socket:[135888]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 17 -> socket:[138370]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 18 -> socket:[138372]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 19 -> socket:[138374]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 2 -> /dev/pts/0
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 20 -> socket:[138376]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 21 -> socket:[138378]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 22 -> socket:[138380]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 23 -> socket:[135915]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 24 -> socket:[135917]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 25 -> socket:[135919]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 26 -> socket:[135921]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 27 -> socket:[135923]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 28 -> socket:[137512]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 3 -> anon_inode:[eventfd]
0 lr-x------ 1 jeanphix jeanphix 64 Nov 17 19:46 4 -> pipe:[137498]
0 l-wx------ 1 jeanphix jeanphix 64 Nov 17 19:46 5 -> pipe:[137498]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 6 -> socket:[137499]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 7 -> socket:[137500]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 8 -> socket:[137501]
0 lrwx------ 1 jeanphix jeanphix 64 Nov 17 19:46 9 -> socket:[138336]
Are you using PySide?
I use pyqt. This is a rough snippet of the code that I am running:
def ghost_capture(incoming_ghost_object, screen_url, rep_fold, screen_name, ewitness_dir_path, local_platform): # Try to get our screenshot and source code of the page # Write both out to disk if possible (if we can get one, # we can get the other) ghost_page, ghost_extra_resources = incoming_ghost_object.open( screen_url, auth=('none', 'none'), default_popup_response=True) if rep_fold.startswith("/") or rep_fold.startswith("C:"): capture_path = join(ewitness_dir_path, rep_fold, "screens", screen_name) else: capture_path = join(rep_fold, "screens", screen_name) incoming_ghost_object.capture_to(capture_path) return ghost_page, ghost_extra_resources
If I provide about 30 different URLs, this is getting larger with each request. Here is a pastebin of the output:
http://pastebin.com/7EnDu5vM
As the script continues through each URL, it continues to grow. That listing above isn't the complete listing, that is from when it's gone through about 10 URLs.
@ChrisTruncer you should try pyside
pip install pyside
but just tried with pyqt and no leaks... maybe it's related to OSX?
This is running on linux, so not OSX.
FTR, to this day, PySide still triggers this problem on Linux while PyQt4 appears to have been fixed.