safaribooks icon indicating copy to clipboard operation
safaribooks copied to clipboard

epub not downloaded (just title)

Open ciapecki opened this issue 6 years ago • 21 comments

I try to get the book providing cookie (I am logged in browser with my company's SSO):

$ safaribooks -c 'BrowserCookie=0eb1e1a9-2f0f-4034-874f-b72f39f59682;SessionID=18ka8abjrrhd3myc5zljpmpvguscj2e0' -b 9781449340124 download-epub
2018-12-04 15:57:50 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-04 15:57:50 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.4-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-04 15:57:50 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2018-12-04 15:57:50 [SafariBooks] INFO: Using `/tmp/tmpo4v1aG` as temporary directory
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-04 15:57:50 [scrapy.core.engine] INFO: Spider opened
2018-12-04 15:57:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-04 15:57:50 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-04 15:57:51 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/>
2018-12-04 15:57:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/accounts/login/> (referer: None)
2018-12-04 15:57:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.safaribooksonline.com/home/> from <GET https://www.safaribooksonline.com/home>
2018-12-04 15:57:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/home/>
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/accounts/login/> (referer: https://www.safaribooksonline.com/accounts/login/)
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124> (referer: https://www.safaribooksonline.com/accounts/login/)
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//library/cover/9781449340124/> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//library/cover/9781449340124/>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-04 15:58:00 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-04 15:58:00 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-04 15:58:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 16754,
 'downloader/request_count': 30,
 'downloader/request_method_count/GET': 30,
 'downloader/response_bytes': 214326,
 'downloader/response_count': 30,
 'downloader/response_status_count/200': 3,
 'downloader/response_status_count/301': 1,
 'downloader/response_status_count/302': 2,
 'downloader/response_status_count/404': 24,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 12, 4, 14, 58, 0, 688254),
 'httperror/response_ignored_count': 24,
 'httperror/response_ignored_status_count/404': 24,
 'log_count/DEBUG': 31,
 'log_count/INFO': 34,
 'memusage/max': 62570496,
 'memusage/startup': 62570496,
 'request_depth_max': 3,
 'response_received_count': 27,
 'scheduler/dequeued': 30,
 'scheduler/dequeued/memory': 30,
 'scheduler/enqueued': 30,
 'scheduler/enqueued/memory': 30,
 'start_time': datetime.datetime(2018, 12, 4, 14, 57, 50, 251804)}
2018-12-04 15:58:00 [scrapy.core.engine] INFO: Spider closed (finished)
ruby-2.5.1 [chris@t480cia safaribooks]$ ls -al converted/
total 12K
drwxr-xr-x 2 chris chris 4.0K Dec  4 15:58 .
drwxr-xr-x 5 chris chris 4.0K Dec  4 15:58 ..
-rw-r--r-- 1 chris chris 2.7K Dec  4 15:58 Head_First_JavaScript_Programming-9781449340124.epub

The downloaded epub is very small 2.7kB.

It seems like only some metadata are downloaded but without any content.

Any hints?

thanks, Chris

ciapecki avatar Dec 04 '18 15:12 ciapecki

same for me...not working Only title is downloaded.

rahulonmars avatar Dec 09 '18 15:12 rahulonmars

same issue. logged in using Company SSO

skeep avatar Dec 09 '18 17:12 skeep

same issue

owen800q avatar Dec 15 '18 09:12 owen800q

This issue was fixed #60

821wkli avatar Dec 15 '18 09:12 821wkli

I fetched that commit but see no change:

ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks -c 'BrowserCookie=cf7fba15-bf46-485d-b585-97c91161aca7;SessionID=x80tkjvh1dylp5hhz5xng8wym1yaehfh' -b 9781449340124 download-epub
2018-12-15 18:19:36 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-15 18:19:36 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.4-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-15 18:19:36 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2018-12-15 18:19:36 [SafariBooks] INFO: Using `/tmp/tmpAH1dtL` as temporary directory
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-15 18:19:36 [scrapy.core.engine] INFO: Spider opened
2018-12-15 18:19:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-15 18:19:36 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-15 18:19:37 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/>
2018-12-15 18:19:37 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-15 18:19:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: None)
2018-12-15 18:19:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.safaribooksonline.com/home/> from <GET https://www.safaribooksonline.com/home>
2018-12-15 18:19:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/home/>
2018-12-15 18:19:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:40 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:40 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:41 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:41 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:41 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:41 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:41 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:41 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:42 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:42 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:42 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:43 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:43 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:43 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:43 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:43 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:43 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:44 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:44 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:44 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:45 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:45 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:45 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:46 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:46 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:47 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:47 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:47 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:48 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//library/cover/9781449340124/> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:48 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//library/cover/9781449340124/>: HTTP status code is not handled or not allowed
2018-12-15 18:19:48 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-15 18:19:48 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-15 18:19:48 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-15 18:19:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 14221,
 'downloader/request_count': 32,
 'downloader/request_method_count/GET': 32,
 'downloader/response_bytes': 214999,
 'downloader/response_count': 32,
 'downloader/response_status_count/200': 3,
 'downloader/response_status_count/301': 1,
 'downloader/response_status_count/302': 4,
 'downloader/response_status_count/404': 24,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 12, 15, 17, 19, 48, 121239),
 'httperror/response_ignored_count': 24,
 'httperror/response_ignored_status_count/404': 24,
 'log_count/DEBUG': 33,
 'log_count/INFO': 34,
 'memusage/max': 61190144,
 'memusage/startup': 61190144,
 'request_depth_max': 3,
 'response_received_count': 27,
 'scheduler/dequeued': 32,
 'scheduler/dequeued/memory': 32,
 'scheduler/enqueued': 32,
 'scheduler/enqueued/memory': 32,
 'start_time': datetime.datetime(2018, 12, 15, 17, 19, 36, 819662)}
2018-12-15 18:19:48 [scrapy.core.engine] INFO: Spider closed (finished)
ruby-2.5.1 [chris@t480cia safaribooks]$ ls -al converted/
total 16K
drwxr-xr-x 2 chris chris 4.0K Dec 15 18:19 .
drwxr-xr-x 5 chris chris 4.0K Dec 15 18:19 ..
-rw-r--r-- 1 chris chris 2.7K Dec 15 18:19 Head_First_JavaScript_Programming-9781449340124.epub

ciapecki avatar Dec 15 '18 17:12 ciapecki

I can confirm that the issue is still there.

hankbao avatar Dec 20 '18 04:12 hankbao

Hey guys, you can use my fix in #62 to download epub for now.

hankbao avatar Dec 20 '18 18:12 hankbao

:(

ruby-2.5.1 [chris@t480cia safaribooks]$ git log -1
commit 1f9ccc9dcf55a74fe4ea4600cea0649311f7f0d8 (HEAD -> pr/62, origin/pr/62)
Author: Hank Bao <[email protected]>
Date:   Fri Dec 21 02:11:49 2018 +0800

    fix: update host in urls with usage text
ruby-2.5.1 [chris@t480cia safaribooks]$ 

ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks -c 'BrowserCookie=cf7fba15-bf46-485d-b585-97c91161aca7;SessionID=x80tkjvh1dylp5hhz5xng8wym1yaehfh' -b 9781449340124 download-epub
2018-12-20 21:31:26 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-20 21:31:26 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.9-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-20 21:31:26 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2018-12-20 21:31:26 [SafariBooks] INFO: Using `/tmp/tmp28d5rb` as temporary directory
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-20 21:31:26 [scrapy.core.engine] INFO: Spider opened
2018-12-20 21:31:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-20 21:31:26 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-20 21:31:26 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/>
2018-12-20 21:31:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-20 21:31:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: None)
2018-12-20 21:31:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.safaribooksonline.com/home/> from <GET https://www.safaribooksonline.com/home>
2018-12-20 21:31:28 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/home/>
2018-12-20 21:31:28 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:29 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:30 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:30 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:30 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:31 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:31 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:31 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:31 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:31 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:31 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:32 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:32 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:32 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:35 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:35 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:35 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:35 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:35 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:35 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:36 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:36 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:36 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//library/cover/9781449340124/> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:36 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//library/cover/9781449340124/>: HTTP status code is not handled or not allowed
2018-12-20 21:31:36 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-20 21:31:36 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-20 21:31:36 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-20 21:31:36 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 14221,
 'downloader/request_count': 32,
 'downloader/request_method_count/GET': 32,
 'downloader/response_bytes': 214969,
 'downloader/response_count': 32,
 'downloader/response_status_count/200': 3,
 'downloader/response_status_count/301': 1,
 'downloader/response_status_count/302': 4,
 'downloader/response_status_count/404': 24,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 12, 20, 20, 31, 36, 568840),
 'httperror/response_ignored_count': 24,
 'httperror/response_ignored_status_count/404': 24,
 'log_count/DEBUG': 33,
 'log_count/INFO': 34,
 'memusage/max': 61202432,
 'memusage/startup': 61202432,
 'request_depth_max': 3,
 'response_received_count': 27,
 'scheduler/dequeued': 32,
 'scheduler/dequeued/memory': 32,
 'scheduler/enqueued': 32,
 'scheduler/enqueued/memory': 32,
 'start_time': datetime.datetime(2018, 12, 20, 20, 31, 26, 613915)}
2018-12-20 21:31:36 [scrapy.core.engine] INFO: Spider closed (finished)

-rw-r--r-- 1 chris chris 2.7K Dec 20 21:31 Head_First_JavaScript_Programming-9781449340124.epub

ciapecki avatar Dec 20 '18 20:12 ciapecki

@ciapecki You were still using the old version. Need to uninstall the old version first and re-setup my fix.

hankbao avatar Dec 21 '18 03:12 hankbao

@hankbao now I uninstalled first but still similar empty file:

ruby-2.5.1 [chris@t480cia safaribooks]$ sudo pip2 uninstall safaribooks
[sudo] password for chris: 
Uninstalling safaribooks-0.1.1:
  Would remove:
    /usr/bin/safaribooks
    /usr/lib/python2.7/site-packages/safaribooks-0.1.1-py2.7.egg-info
    /usr/lib/python2.7/site-packages/safaribooks/*
Proceed (y/n)? y
  Successfully uninstalled safaribooks-0.1.1
ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks
bash: /usr/bin/safaribooks: No such file or directory

then installed and ran:

Successfully installed safaribooks-0.1.1
ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks -c 'BrowserCookie=cf7fba15-bf46-485d-b585-97c91161aca7;SessionID=x80tkjvh1dylp5hhz5xng8wym1yaehfh' -b 9781449340124 download-epub
2018-12-21 08:14:49 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-21 08:14:49 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.9-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-21 08:14:49 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2018-12-21 08:14:49 [SafariBooks] INFO: Using `/tmp/tmpKwNTat` as temporary directory
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-21 08:14:49 [scrapy.core.engine] INFO: Spider opened
2018-12-21 08:14:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-21 08:14:49 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-21 08:14:49 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://learning.oreilly.com/>
2018-12-21 08:14:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: None)
2018-12-21 08:14:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://learning.oreilly.com/home/> from <GET https://learning.oreilly.com/home>
2018-12-21 08:14:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://learning.oreilly.com/home/>
2018-12-21 08:14:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-21 08:14:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch11.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch12.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch10.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch08.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch09.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch07.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch06.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch05.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch04.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch03.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch02.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch01.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:56 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr04.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:56 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr05.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:56 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr03.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr02.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/copyright.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/co02.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:58 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ix01.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:58 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/apa.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:58 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch13.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:59 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/dedication.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/library/cover/9781449340124/> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:59 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-21 08:14:59 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-21 08:14:59 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-21 08:14:59 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 16440,
 'downloader/request_count': 30,
 'downloader/request_method_count/GET': 30,
 'downloader/response_bytes': 52402,
 'downloader/response_count': 30,
 'downloader/response_status_count/200': 4,
 'downloader/response_status_count/301': 1,
 'downloader/response_status_count/302': 2,
 'downloader/response_status_count/401': 23,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 12, 21, 7, 14, 59, 342137),
 'httperror/response_ignored_count': 23,
 'httperror/response_ignored_status_count/401': 23,
 'log_count/DEBUG': 31,
 'log_count/INFO': 33,
 'memusage/max': 61227008,
 'memusage/startup': 61227008,
 'request_depth_max': 3,
 'response_received_count': 27,
 'scheduler/dequeued': 30,
 'scheduler/dequeued/memory': 30,
 'scheduler/enqueued': 30,
 'scheduler/enqueued/memory': 30,
 'start_time': datetime.datetime(2018, 12, 21, 7, 14, 49, 131657)}
2018-12-21 08:14:59 [scrapy.core.engine] INFO: Spider closed (finished)
ruby-2.5.1 [chris@t480cia safaribooks]$ ls -al converted/
total 20K
drwxr-xr-x 2 chris chris 4.0K Dec 21 08:14 .
drwxr-xr-x 5 chris chris 4.0K Dec 21 08:14 ..
-rw-r--r-- 1 chris chris 9.4K Dec 21 08:14 Head_First_JavaScript_Programming-9781449340124.epub

The file is bigger than before 9.4kB instead of 2.7kB but it's still content empty.

ciapecki avatar Dec 21 '18 07:12 ciapecki

@ciapecki A lot of errors with code 401 popped. It seems like the authentication credential you provided was invalid.

Can you try downloading your book with username and password?

hankbao avatar Dec 21 '18 07:12 hankbao

@hankbao I am logged with company's SSO. We don't have username/password. While I am logged in (I can see and read books) I get the BrowserCookie and SessionID from Chrome Inspect panel (F12). Maybe I am missing some more details from Cookie?

ciapecki avatar Dec 21 '18 07:12 ciapecki

@hankbao I am logged with company's SSO. We don't have username/password. While I am logged in (I can see and read books) I get the BrowserCookie and SessionID from Chrome Inspect panel (F12). Maybe I am missing some more details from Cookie?

I haven't looked into the cookie and session part of the code so I'm not for sure. However, with username and password, I can download my book now. Sometimes there were some 503 errors for some pages but you can always get the whole book by retrying.

hankbao avatar Dec 21 '18 07:12 hankbao

Thanks @hankbao It works for me with Docker and my company's SSO

sanmibuh avatar Dec 28 '18 14:12 sanmibuh

@hankbao I still have the same problem as @sanmibuh, with both docker and normal cli, both user/pass and cookie. Including log from using docker and cookie, but the 401 errors are the same in the other three configurations. Log: https://www.dropbox.com/s/i3xmvcskwgt9yf1/safaribooks.log?dl=0

tofagerl avatar Jan 06 '19 15:01 tofagerl

@hankbao I still have the same problem as @sanmibuh, with both docker and normal cli, both user/pass and cookie. Including log from using docker and cookie, but the 401 errors are the same in the other three configurations. Log: https://www.dropbox.com/s/i3xmvcskwgt9yf1/safaribooks.log?dl=0

If you got 401s with username/password, perhaps your password is indeed incorrect. I'm not familiar with the cookie part of this project. Maybe @sanmibuh could share his experience.

hankbao avatar Jan 06 '19 16:01 hankbao

@hankbao Yeah, I thought the same, but it's the exact same one I use to login with. Copied straight out of my password manager. I'm gonna change it and see if that works.

tofagerl avatar Jan 06 '19 17:01 tofagerl

@hankbao Oh, ok. I changed my password, and that didn't work, but then I put it in quotes, and that worked. I use autogenerated passwords with lots of weird characters, so I should have thought of that earlier.

tofagerl avatar Jan 06 '19 17:01 tofagerl

@hankbao or @tofagerl I'm a little lost. I keep getting either:

Traceback (most recent call last): File "/usr/local/bin/safaribooks", line 11, in load_entry_point('safaribooks', 'console_scripts', 'safaribooks')() File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 487, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 2728, in load_entry_point return ep.load() File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 2346, in load return self.resolve() File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 2352, in resolve module = import(self.module_name, fromlist=['name'], level=0) ModuleNotFoundError: No module named 'safaribooks.main'

or

docker: Error response from daemon: create $(pwd)/converted: "$(pwd)/converted" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path. See 'docker run --help'.

Thanks.

BrianBrinkley avatar Jan 11 '19 02:01 BrianBrinkley

Hey guys, you can use my fix in #62 to download epub for now.

I can confirm. This works, but i'm not able to open epub

rahulonmars avatar Jan 17 '19 08:01 rahulonmars

Having the same issue:

2019-01-21 12:11:15 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781457191350/chapter/04-ch1.xhtml>: HTTP status code is not handled or not allowed

JoeriBe avatar Jan 21 '19 11:01 JoeriBe