CommunityScrapers icon indicating copy to clipboard operation
CommunityScrapers copied to clipboard

JavLibrary scrapers not working

Open theeda opened this issue 1 year ago • 8 comments

** Scraper name ** JavLibrary_python

** Scraper method ** Getting the following error when trying to scrape with JavLibrary_python:

error while fragment scraping with scraper JavLibrary_python: could not unmarshal json from script output: EOF

theeda avatar May 06 '24 15:05 theeda

Thank you for reporting this: I suspect this is failing for you because the Python scrapers need a little more setup than other scrapers as described in the README for this repo (we have not yet found a way to display this in the Stash UI)

But even after you've got that set up I'm afraid the JavLibrary scraper will still fail due to their recent application of a more aggressive Cloudflare strategy. We have reports from users who have used a VPN to get a Japanese IP address and that seems to work for scraping JavLibrary, but for right now there's no easy way to scrape their site.

Maista6969 avatar May 06 '24 15:05 Maista6969

When you set the Flaresolverr url in the Javlibrary Python file you can get it going again even with non-JP VPN. Bit more work and you need a Windows machine with Flaresolverr however getting consistent good results with Stash.

Net005 avatar Jul 07 '24 20:07 Net005

@Net005 did you have to do anything special apart from the standard Flaresolverr setup to get it to work? When I try to use Flaresolverr, I get the following errors:

The Cloudflare 'Verify you are human' button not found on the page. Waiting for title (attempt X): Just a moment... Timeout waiting for selector

It keeps retrying but it can't find the checkbox

theeda avatar Aug 13 '24 16:08 theeda

@Net005 did you have to do anything special apart from the standard Flaresolverr setup to get it to work?

I was able to get the scraper working for me with flaresolverr in a separate docker container: -> It seems to be an issue with flaresolverr, that it can't find the checkbox. There is fork which doesn't have this issue, as I found out here: https://github.com/FlareSolverr/FlareSolverr/issues/1380#issuecomment-2390708414 (21hsmw/flaresolverr:nodriver)

-> I had to change line 445 in the .py from response_html.status_code = json_input['solution']['status'] to response_html.status_code = responseJson.status_code, as flaresolverr always returned "None"in the json while the code expected a status code (like 200). Not sure if this is an issue with the flaresolverr fork, as it's my first time using flaresolverr. https://github.com/stashapp/CommunityScrapers/blob/0f56cb5c43baed3ea353e98217a166eacac2354d/scrapers/JavLibrary_python/JavLibrary_python.py#L445

Maybe this will help someone.

(Edited: Fixed wrong information)

Pheromir avatar Oct 03 '24 15:10 Pheromir

line 445 in which file ? docker-compose.yml as like 20 lines.

janemba avatar Oct 03 '24 16:10 janemba

Sorry, had a brainfart I guess. I meant the .py of the scraper and forgot the link: https://github.com/stashapp/CommunityScrapers/blob/0f56cb5c43baed3ea353e98217a166eacac2354d/scrapers/JavLibrary_python/JavLibrary_python.py#L445

Pheromir avatar Oct 03 '24 18:10 Pheromir

My reproduction steps to my problem: I am running in a linux container on a windows 10 host.

docker-compose.yml

# APPNICENAME=Stash
# APPDESCRIPTION=An organizer for your porn, written in Go
services:
  stash:
    image: stashapp/stash:latest
    container_name: stash
    hostname: stash
    restart: unless-stopped
    ## the container's port must be the same with the STASH_PORT in the environment section
    ports:
      - "3003:9999"
    ## If you intend to use stash's DLNA functionality uncomment the below network mode and comment out the above ports section
    # network_mode: host
    logging:
      driver: "json-file"
      options:
        max-file: "10"
        max-size: "2m"
    environment:
      - TZ="Europe/Rome"
      - STASH_STASH=/data/
      - STASH_GENERATED=/generated/
      - STASH_METADATA=/metadata/
      - STASH_CACHE=/cache/
      ## Adjust below to change default port (9999)
      - STASH_PORT=9999
    volumes:
      ## Adjust below paths (the left part) to your liking.
      ## E.g. you can change ./config:/root/.stash to ./stash:/root/.stash
      
      ## Keep configs, scrapers, and plugins here.
      - ./config:/root/.stash
      ## Point this at your collection.
      - D:/private/media:/data:ro
      ## This is where your stash's metadata lives
      - ./metadata:/metadata
      ## Any other cache content.
      - ./cache:/cache
      ## Where to store binary blob data (scene covers, images)
      - ./blobs:/blobs
      ## Where to store generated content (screenshots,previews,transcodes,sprites)
      - ./generated:/generated

  # ===== FlareSolverr =====
  flaresolverr:
    image: 21hsmw/flaresolverr:fixlooping
    container_name: flaresolverr-stash
    environment:
      - LOG_LEVEL=${LOG_LEVEL:-info}
      - LOG_HTML=${LOG_HTML:-false}
      - CAPTCHA_SOLVER=${CAPTCHA_SOLVER:-none}
      - TZ=Europe/Rome
      - LANG=fr-FR
    restart: unless-stopped

networks:
  default:
    name: caddy_net
    external: true

I installed the Javlibrary_python scraper from the settings, then edited JavLibrary_python.py

FLARESOLVERR_ENABLED = True
FLARESOLVERR_URL = "http://flaresolverr-stash:8191/v1"

In the Tagger page I have a file named BBAN-414.mp4, when I query BBAN-414 and press search I get The search doesn't return any result.

I tried debugging the script on my own but I can't seem to get it working, it looked like the script was not following redirects maybe? When I entered the value of JAV_SEARCH_HTML.url in my browser it redirected me to the correct entry. For example: JAV_SEARCH_HTML.url = https://www.javlibrary.com/en/vl_searchbyid.php?keyword=BBAN-414 redirects to https://www.javlibrary.com/en/?v=javmeebk4e in my browser.

TecnoCreeper avatar Oct 24 '24 10:10 TecnoCreeper

I solved my problem. I had to use 21hsmw/flaresolverr:nodriver as the image instead of fixlooping. No other changes (other than enabling flaresolverr and configuring the url).

TecnoCreeper avatar Oct 24 '24 15:10 TecnoCreeper

As a extra to this, I tried to get it running as well.

With Flaresolverr in docker I couldn't get it to run at all. I installed Flaresolverr on my windows PC and routed through that. I now at least get logs and Flaresolverr seems to be detecting the cloudflare but just times out.

2024-10-25 15:11:22 INFO     Challenge detected. Title found: Just a moment...
2024-10-25 15:12:24 ERROR    Error: Error solving the challenge. Timeout after 60.0 seconds.
2024-10-25 15:12:24 INFO     Response in 62.984 s
2024-10-25 15:12:24 INFO     192.168.1.54 POST http://192.168.1.10:8191/v1 500 Internal Server Error
2024-10-25 15:12:29 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.javlibrary.com/en/vl_searchbyid.php?keyword=SHYN-213', 'maxTimeout': 60000}
2024-10-25 15:12:30 INFO     Challenge detected. Title found: Just a moment...
2024-10-25 15:13:32 ERROR    Error: Error solving the challenge. Timeout after 60.0 seconds.
2024-10-25 15:13:32 INFO     Response in 63.026 s
2024-10-25 15:13:32 INFO     192.168.1.54 POST http://192.168.1.10:8191/v1 500 Internal Server Error

Gykes avatar Oct 25 '24 22:10 Gykes

Using this pr:

alexfozor/flaresolverr:pr-1300-experimental

I was able to get around the CF issue.

2024-10-26 00:30:07 INFO     Serving on http://0.0.0.0:8191
2024-10-26 00:34:09 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.javlibrary.com/en/vl_searchbyid.php?keyword=mdvr00324 1', 'maxTimeout': 60000}
2024-10-26 00:34:17 INFO     Challenge detected.
2024-10-26 00:34:24 INFO     Challenge solved!
2024-10-26 00:34:25 INFO     Response in 15.336 s
2024-10-26 00:34:25 INFO     172.17.0.1 POST http://192.168.1.54:8191/v1 200 OK
2024-10-26 00:34:43 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.javlibrary.com/en/vl_searchbyid.php?keyword=SHYN-213', 'maxTimeout': 60000}
2024-10-26 00:34:46 INFO     Challenge detected.
2024-10-26 00:34:49 INFO     Challenge solved!
2024-10-26 00:34:50 INFO     Response in 6.463 s
2024-10-26 00:34:50 INFO     172.17.0.1 POST http://192.168.1.54:8191/v1 200 OK 

It still returns no data for some reason but at least it isn't a CF block anymore. Maybe @Maista6969 can look into why that's happening

Gykes avatar Oct 26 '24 00:10 Gykes

I got the same issue. I run stash app on my nas docker, my nas can visit missav.com, but the MissAV(jp) not working: QQ_1730218163645

mgfjx avatar Oct 29 '24 16:10 mgfjx

Partially resoled by #2218, will reopen if there is JavLibrary-specific blocks

feederbox826 avatar Jun 05 '25 04:06 feederbox826