PGMA-Modernized
PGMA-Modernized copied to clipboard
Posters for IAFD
IAFD has links to AEBN, GayHotMovies, GayDVDEmpire, CD Universe just like GEVI does
With this in mind I have the code put in to scrape these external sites and get any data that is missing in IAFD. especially Posters and Background art.
Unfotunately on running the asp link to point to the shop - I get a 403 Forbidden result..... In chrome developer when I pic the link - I can see within the response header a Location entry that point to the webpage as it does in GEVi. I need to find out how to access this....
One of you helped as with the issues with GEVI, by setting up a refereal header instance some months ago... which saved my bacon in more ways than one.. Could you give some suggestions in ragard to this ---- the offending code is in utils.py - getFilmonIAFD function
Cheers
Jason xx
Cosy will have the code in the nest 10 minutes....
It looks like the blog sites (fagalicious) isn't pulling in posters now either but it could be our URL has expired.
(sorry for the duplicate post)
@fivedays555:
Hope this finds you well! @JPH71 wanted to once again say THANKS! You're quick solution above is going to allow him to add an enhancement to IAFD.....for films, IAFD provides links to index sites that have the film's cover artwork, so Jason will be working on an enhancement that should allow the IAFD agent to crawl to film Film covers, since IAFD itself doesn't contain artwork other than Actor headshots.
THANKS!
Not a problem. Glad I can help. Let me know if you need any more information.
Here is the issue - this happens when using IAFD as the scraping Agent
IAFD has links to AEBN, GayHotMovies, GayDVDEmpire, CD Universe just like GEVI does
With this in mind I have the code put in to scrape these external sites and get any data that is missing in IAFD. especially Posters and Background art.
This is the section of the log file:
2022-08-19 02:34:25,484 (21f8) : INFO (logkit:16) - IAFD - UTILS :: Access External Links in IAFD: Skip Current Agent Links: IAFD 2022-08-19 02:34:25,484 (21f8) : INFO (logkit:16) - IAFD - UTILS :: External Sites Found 1 - AdultEmpire - https://www.iafd.com/shopclick.asp?sku=22956990 2022-08-19 02:34:25,500 (21f8) : INFO (logkit:16) - IAFD - UTILS :: 2 - HotMovies - https://www.iafd.com/shopclick.asp?sku=9344975 2022-08-19 02:34:25,500 (21f8) : INFO (logkit:16) - IAFD - UTILS :: 3 - HotMovies - https://www.iafd.com/shopclick.asp?sku=8390429 2022-08-19 02:34:25,500 (21f8) : INFO (logkit:16) - IAFD - UTILS :: 4 - AdultEmpire - https://www.iafd.com/shopclick.asp?sku=22956383 2022-08-19 02:34:25,500 (21f8) : INFO (logkit:16) - IAFD - UTILS :: Valid Sites Left 2 - ['AdultEmpire', 'HotMovies'] 2022-08-19 02:34:25,516 (21f8) : DEBUG (networking:143) - Requesting ' https://www.iafd.com/shopclick.asp?sku=8390429' 2022-08-19 02:34:25,625 (21f8) : ERROR (networking:196) - Error opening URL 'https://www.iafd.com/shopclick.asp?sku=8390429' 2022-08-19 02:34:25,625 (21f8) : ERROR (logkit:22) - IAFD - UTILS :: Error reading External HotMovies URL Link: HTTP Error 403: Forbidden 2022-08-19 02:34:25,641 (21f8) : DEBUG (networking:143) - Requesting ' https://www.iafd.com/shopclick.asp?sku=22956383' 2022-08-19 02:34:25,755 (21f8) : ERROR (networking:196) - Error opening URL 'https://www.iafd.com/shopclick.asp?sku=22956383' 2022-08-19 02:34:25,755 (21f8) : ERROR (logkit:22) - IAFD - UTILS :: Error reading External AdultEmpire URL Link: HTTP Error 403: Forbidden
I need to be able to get from : https://www.iafd.com/shopclick.asp?sku=9344975
to the following:
1 - is the link i have entered into the address bar - that changes to gay hotmovies
which shows up in 2 as the header.... [image: image.png]
inside utils.py the code is within the Function: getFilmOnIAFD line 210 and the error is caused by line 356..... the function HTML.ElementFromURL(value, timeout=60, errors='ignore', sleep= DELAY)
is a plex inbuilt function... I have some old documentation that explains this plex function if you need... but it works like the python requests library...
if you look at the GEVI init.py file, you will see how we implemented your previous suggestion to get it to start working...
Many thanks
Jason
On Fri, 19 Aug 2022 at 02:13, fivedays555 @.***> wrote:
Not a problem. Glad I can help. Let me know if you need any more information.
— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220131249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKLKWL5B3QJJN3LTMVLVZ3NUDANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>
I tried the following attempt. Should be working:
url='https://www.iafd.com/shopclick.asp?sku=9344975'
response = get_scraper_request(url)
res = html.fromstring(response.text)
res.xpath('//*[@class="title"]')[0].text
>>> 'Fire Watch 2'
I think the direct request would fail is because the iafd using Cloudflare to block unwanted requests.
You are a star!
is the get_scraper_request code already in the plex agent??
Cheers
On Fri, 19 Aug 2022 at 04:11, fivedays555 @.***> wrote:
I tried the following attempt. Should be working:
url='https://www.iafd.com/shopclick.asp?sku=9344975' response = get_scraper_request(url) res = html.fromstring(response.text) @.***="title"]')[0].text
'Fire Watch 2'
I think the direct request would fail is because the iafd using Cloudflare to block unwanted requests.
— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220195555, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKJYMCJJYWWVB5C3HELVZ33PVANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>
Should be. Otherwise, you won't be able to scrape IAFD.
Just searched through the utils.py file and there is no module/route starting with get-scraper_request
cheers and sorry to be a nusidance
On Fri, 19 Aug 2022 at 07:34, fivedays555 @.***> wrote:
Should be. Otherwise, you won't be able to scrape IAFD.
— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220305720, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKPHNK4FCLVLODTAJDDVZ4TIFANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>
No Problem. I will put the function call below.
import cloudscraper
scraper = cloudscraper.create_scraper()
def get_scraper_request(url, **kwargs):
logging.info("Requesting: " + url)
headers = kwargs.pop('headers', {})
cookies = kwargs.pop('cookies', {})
timeout = kwargs.pop('timeout', 30)
proxies = {}
global scraper
if 'User-Agent' not in headers:
# headers['User-Agent'] = (fake_useragent.UserAgent(fallback='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15')).random
headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15'
scraper.headers.update(headers)
scraper.cookies.update(cookies)
try:
scraper_request = scraper.request(
'GET', url, timeout=timeout, proxies=proxies)
except Exception as ex:
logging.exception('CloudScraper Failed.')
if scraper_request and not scraper_request.ok:
msg = ('< CloudScraper Failed Request Status Code: ' +
str(scraper_request.status_code) + '>')
logging.error(msg)
return scraper_request
Cheers Man.....
I have been up all night - sorting out duplicate cast entries....
Thanks for all the help!
Jason
On Fri, 19 Aug 2022 at 08:06, fivedays555 @.***> wrote:
No Problem. I will put the function call below.
import cloudscraper
scraper = cloudscraper.create_scraper()
def get_scraper_request(url, **kwargs): logging.info("Requesting: " + url) headers = kwargs.pop('headers', {}) cookies = kwargs.pop('cookies', {}) timeout = kwargs.pop('timeout', 30) proxies = {}
global scraper if 'User-Agent' not in headers: # headers['User-Agent'] = (fake_useragent.UserAgent(fallback='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15')).random headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15' scraper.headers.update(headers) scraper.cookies.update(cookies) try: scraper_request = scraper.request( 'GET', url, timeout=timeout, proxies=proxies) except Exception as ex: logging.exception('CloudScraper Failed.') if scraper_request and not scraper_request.ok: msg = ('< CloudScraper Failed Request Status Code: ' + str(scraper_request.status_code) + '>') logging.error(msg) return scraper_request
— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220326262, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKPK4HOJ7TCDCCKXZSDVZ4W5RANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>
Glad I can help. Cheers!
One last thing Adult Film Database....
I don't know what they changed - but the code to scrape now fails.... if you have the time - send me a few pointers so I can get this agent working again...
Your help has been much appreciated...
I will implement the changes you have sent into the GetFilmOnIAFD today and get back to you with the results as soon as possible...
I think I better have a date with Morpheus now... been up all night...
Jason xxx
On Fri, 19 Aug 2022 at 08:29, fivedays555 @.***> wrote:
Glad I can help. Cheers!
— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220345169, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKK7HE62QTNJNCMNNUTVZ4ZWFANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>
Not sure what you need. But IAFD has a very sensitive request rate limit. To be safe, I put delay for each IAFD request as
time.sleep(randint(100, 200)/10)
And all IAFD requests would need the cloudscraper function.
Let me know if you need more information.
The last request has to do with another agent, Adult Film Database not IAFD... rather than just building a search string one has to create formdata and headers and perform a pull request... A right pain in the nethers when it stops working....
I will put in that random time sleep in the IAFD code... in the cloudscraper section.
Thanks once again...
On Fri, 19 Aug 2022 at 08:50, fivedays555 @.***> wrote:
Not sure what you need. But IAFD has a very sensitive request rate limit. To be safe, I put delay for each IAFD request as time.sleep(randint(100, 200)/10)
And all IAFD requests would need the cloudscraper function.
Let me know if you need more information.
— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1220362367, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKMCCBZF3YJNA3K5G3LVZ44DTANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>
Oh, I did not realize it was for Adult Film Database.
I never touch or use the Adult Film Database agent, so I don't really know...
Mostly, I am using Waybig, Fagalicious Queerclick, and IAFD. They almost cover everything I need.
I took a look at Adult Film Database (https://www.adultfilmdatabase.com/), and I think there are so few gay titles there. Why bother?
@JPH71 Can this be closed?
Yes it can...
On Thu, 29 Dec 2022, 04:13 Cody Berenson, @.***> wrote:
@JPH71 https://github.com/JPH71 Can this be closed?
— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1367045104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKLDC5LJMIDEEA4H2WTWPT6XJANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>
No don't... I haven't sorted out IAFD posters yet
On Thu, 29 Dec 2022, 04:23 Jason Hudson, @.***> wrote:
Yes it can...
On Thu, 29 Dec 2022, 04:13 Cody Berenson, @.***> wrote:
@JPH71 https://github.com/JPH71 Can this be closed?
— Reply to this email directly, view it on GitHub https://github.com/CodyBerenson/PGMA-Modernized/issues/190#issuecomment-1367045104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI3AKLDC5LJMIDEEA4H2WTWPT6XJANCNFSM55NEJ7BA . You are receiving this because you were mentioned.Message ID: @.***>