webcomix icon indicating copy to clipboard operation
webcomix copied to clipboard

Error On Run

Open LeeThompson opened this issue 1 year ago • 9 comments

WebComix Version: 3.11.1 OS: Windows 10 Enterprise Edition (10.0.19044.2006) (x64) Python: 3.9.5

Trying to download a custom comic, the XPATHs are correct and work in scrapy shell.

The error message is very unhelpful.

The comic in question is NSFW so I'm not comfortable putting the command line argument here. I will try some other comics and see if I get the same result though.

Update: Unfortunately it worked fine on the non NSFW site.

Traceback (most recent call last):
  File "C:\Python\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Python\Scripts\webcomix.exe\__main__.py", line 7, in <module>
  File "C:\Python\lib\site-packages\click\core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "C:\Python\lib\site-packages\click\core.py", line 1049, in main
    args = _expand_args(args)
  File "C:\Python\lib\site-packages\click\utils.py", line 572, in _expand_args
    matches = glob(arg, recursive=glob_recursive)
  File "C:\Python\lib\glob.py", line 21, in glob
    return list(iglob(pathname, recursive=recursive))
  File "C:\Python\lib\glob.py", line 73, in _iglob
    for dirname in dirs:
  File "C:\Python\lib\glob.py", line 74, in _iglob
    for name in glob_in_dir(dirname, basename, dironly):
  File "C:\Python\lib\glob.py", line 85, in _glob1
    return fnmatch.filter(names, pattern)
  File "C:\Python\lib\fnmatch.py", line 58, in filter
    match = _compile_pattern(pat)
  File "C:\Python\lib\fnmatch.py", line 52, in _compile_pattern
    return re.compile(res).match
  File "C:\Python\lib\re.py", line 252, in compile
    return _compile(pattern, flags)
  File "C:\Python\lib\re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Python\lib\sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Python\lib\sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "C:\Python\lib\sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "C:\Python\lib\sre_parse.py", line 834, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "C:\Python\lib\sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "C:\Python\lib\sre_parse.py", line 598, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range v-b at position 22

LeeThompson avatar Jan 10 '25 07:01 LeeThompson

If I look at the beginning of the traceback, I can see that the error comes from click, which is the library used to create the CLI. I think the issue comes from this library not being able to parse the command as a whole into its arguments properly.

Make sure you enclose URLs and XPath in double-quotes and that there aren't double-quotes in the XPath itself (you either can escape them or use single-quotes, which should also work)

J-CPelletier avatar Jan 10 '25 16:01 J-CPelletier

Hmm that's odd, no double quotes.

Here are the XPATHs:

--next-page-xpath="//a[@class='comic-nav-base comic-nav-next']/@href" --image-xpath="//div[@id='comic']//img/@src"

They work in scrapy shell.

I'll try quoting the --start-url, you didn't quote them in the examples so it didn't even occur to me. If that fixes it, it will then be a documentation issue ;)

Update; Nope, same error.

LeeThompson avatar Jan 10 '25 20:01 LeeThompson

I have a second case of this, this time it's a (relatively) SFW comic.

Command Line

webcomix custom zoophobia --start-url="https://zoophobia-comic.tumblr.com/post/127351123949" --next-page-xpath="//a[@class='next-button']/@href" --image-xpath="//figure[@class='photo-hires-item correct']//img/@src" --cbz

Notes

  • In this case there are two "next" buttons but they are duplicates of each other.
  • The error isn't 100% the same but close.

Scrapy Shell

scrapy shell https://zoophobia-comic.tumblr.com/post/127351123949
>>> response.xpath("//a[@class='next-button']/@href").get()
'https://zoophobia-comic.tumblr.com/post/127351131639'
>>> response.xpath("//figure[@class='photo-hires-item correct']//img/@src").get()
'https://64.media.tumblr.com/3a343a2dd3226b62b5ba286702b8949e/tumblr_ntidtra2XG1udrxz7o1_1280.png'

Error

Traceback (most recent call last):
  File "C:\Python\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Python\Scripts\webcomix.exe\__main__.py", line 7, in <module>
  File "C:\Python\lib\site-packages\click\core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "C:\Python\lib\site-packages\click\core.py", line 1049, in main
    args = _expand_args(args)
  File "C:\Python\lib\site-packages\click\utils.py", line 572, in _expand_args
    matches = glob(arg, recursive=glob_recursive)
  File "C:\Python\lib\glob.py", line 21, in glob
    return list(iglob(pathname, recursive=recursive))
  File "C:\Python\lib\glob.py", line 73, in _iglob
    for dirname in dirs:
  File "C:\Python\lib\glob.py", line 74, in _iglob
    for name in glob_in_dir(dirname, basename, dironly):
  File "C:\Python\lib\glob.py", line 85, in _glob1
    return fnmatch.filter(names, pattern)
  File "C:\Python\lib\fnmatch.py", line 58, in filter
    match = _compile_pattern(pat)
  File "C:\Python\lib\fnmatch.py", line 52, in _compile_pattern
    return re.compile(res).match
  File "C:\Python\lib\re.py", line 252, in compile
    return _compile(pattern, flags)
  File "C:\Python\lib\re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Python\lib\sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Python\lib\sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "C:\Python\lib\sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "C:\Python\lib\sre_parse.py", line 834, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "C:\Python\lib\sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "C:\Python\lib\sre_parse.py", line 598, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range t-b at position 17

LeeThompson avatar Jan 11 '25 00:01 LeeThompson

I tried to reproduce this issue on Linux, but couldn't. I assume this is something related to click and its handling of regular expressions in Windows, as that's something that I've found in its issues board. One of the ways you could fix this issue is by using a Unix system (through dual-boot or Docker) to download the images.

While testing for this, I found another issue which is a bit puzzling: The tumblr example you gave me doesn't give me the same view in my spider vs. in scrapy shell. I'll investigate this issue further when I have some time.

J-CPelletier avatar Jan 14 '25 03:01 J-CPelletier

After exploring the second issue a bit more, this issue seems related to the usage of a fake useragent, since not having it solves the issue. I'll do a PR to test both settings at some point.

J-CPelletier avatar Jan 14 '25 03:01 J-CPelletier

UPDATE: I was able to rip a comic using WSL (Windows Subsystem for Linux) so there does seem to be some issues with the Windows port of the xpath parser (and possibly other bits).

LeeThompson avatar Jan 24 '25 02:01 LeeThompson

For some reason, the comic doesn't work anymore on my end, whether with or without the user agent. With that said, I'll see what I can do to fix the issue related to Windows.

J-CPelletier avatar Mar 10 '25 17:03 J-CPelletier

The newest release 3.11.3 should help solve the issue you were having on Windows. If not, I'll try to investigate it a bit further.

J-CPelletier avatar Mar 11 '25 04:03 J-CPelletier

@LeeThompson I've also been testing it on my own Windows installation and I haven't been able to reproduce your issue using the same Python and webcomix version 🤔

J-CPelletier avatar Mar 14 '25 17:03 J-CPelletier