safaribooks
safaribooks copied to clipboard
epub not downloaded (just title)
I try to get the book providing cookie (I am logged in browser with my company's SSO):
$ safaribooks -c 'BrowserCookie=0eb1e1a9-2f0f-4034-874f-b72f39f59682;SessionID=18ka8abjrrhd3myc5zljpmpvguscj2e0' -b 9781449340124 download-epub
2018-12-04 15:57:50 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-04 15:57:50 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.4-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-04 15:57:50 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-04 15:57:50 [SafariBooks] INFO: Using `/tmp/tmpo4v1aG` as temporary directory
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-04 15:57:50 [scrapy.core.engine] INFO: Spider opened
2018-12-04 15:57:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-04 15:57:50 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-04 15:57:51 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/>
2018-12-04 15:57:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/accounts/login/> (referer: None)
2018-12-04 15:57:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.safaribooksonline.com/home/> from <GET https://www.safaribooksonline.com/home>
2018-12-04 15:57:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/home/>
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/accounts/login/> (referer: https://www.safaribooksonline.com/accounts/login/)
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124> (referer: https://www.safaribooksonline.com/accounts/login/)
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//library/cover/9781449340124/> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//library/cover/9781449340124/>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-04 15:58:00 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-04 15:58:00 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-04 15:58:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 16754,
'downloader/request_count': 30,
'downloader/request_method_count/GET': 30,
'downloader/response_bytes': 214326,
'downloader/response_count': 30,
'downloader/response_status_count/200': 3,
'downloader/response_status_count/301': 1,
'downloader/response_status_count/302': 2,
'downloader/response_status_count/404': 24,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 12, 4, 14, 58, 0, 688254),
'httperror/response_ignored_count': 24,
'httperror/response_ignored_status_count/404': 24,
'log_count/DEBUG': 31,
'log_count/INFO': 34,
'memusage/max': 62570496,
'memusage/startup': 62570496,
'request_depth_max': 3,
'response_received_count': 27,
'scheduler/dequeued': 30,
'scheduler/dequeued/memory': 30,
'scheduler/enqueued': 30,
'scheduler/enqueued/memory': 30,
'start_time': datetime.datetime(2018, 12, 4, 14, 57, 50, 251804)}
2018-12-04 15:58:00 [scrapy.core.engine] INFO: Spider closed (finished)
ruby-2.5.1 [chris@t480cia safaribooks]$ ls -al converted/
total 12K
drwxr-xr-x 2 chris chris 4.0K Dec 4 15:58 .
drwxr-xr-x 5 chris chris 4.0K Dec 4 15:58 ..
-rw-r--r-- 1 chris chris 2.7K Dec 4 15:58 Head_First_JavaScript_Programming-9781449340124.epub
The downloaded epub is very small 2.7kB.
It seems like only some metadata are downloaded but without any content.
Any hints?
thanks, Chris
same for me...not working Only title is downloaded.
same issue. logged in using Company SSO
same issue
This issue was fixed #60
I fetched that commit but see no change:
ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks -c 'BrowserCookie=cf7fba15-bf46-485d-b585-97c91161aca7;SessionID=x80tkjvh1dylp5hhz5xng8wym1yaehfh' -b 9781449340124 download-epub
2018-12-15 18:19:36 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-15 18:19:36 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.4-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-15 18:19:36 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-15 18:19:36 [SafariBooks] INFO: Using `/tmp/tmpAH1dtL` as temporary directory
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-15 18:19:36 [scrapy.core.engine] INFO: Spider opened
2018-12-15 18:19:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-15 18:19:36 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-15 18:19:37 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/>
2018-12-15 18:19:37 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-15 18:19:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: None)
2018-12-15 18:19:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.safaribooksonline.com/home/> from <GET https://www.safaribooksonline.com/home>
2018-12-15 18:19:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/home/>
2018-12-15 18:19:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:40 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:40 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:41 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:41 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:41 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:41 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:41 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:41 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:42 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:42 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:42 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:43 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:43 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:43 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:43 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:43 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:43 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:44 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:44 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:44 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:45 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:45 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:45 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:46 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:46 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:47 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:47 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:47 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:48 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//library/cover/9781449340124/> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:48 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//library/cover/9781449340124/>: HTTP status code is not handled or not allowed
2018-12-15 18:19:48 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-15 18:19:48 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-15 18:19:48 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-15 18:19:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 14221,
'downloader/request_count': 32,
'downloader/request_method_count/GET': 32,
'downloader/response_bytes': 214999,
'downloader/response_count': 32,
'downloader/response_status_count/200': 3,
'downloader/response_status_count/301': 1,
'downloader/response_status_count/302': 4,
'downloader/response_status_count/404': 24,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 12, 15, 17, 19, 48, 121239),
'httperror/response_ignored_count': 24,
'httperror/response_ignored_status_count/404': 24,
'log_count/DEBUG': 33,
'log_count/INFO': 34,
'memusage/max': 61190144,
'memusage/startup': 61190144,
'request_depth_max': 3,
'response_received_count': 27,
'scheduler/dequeued': 32,
'scheduler/dequeued/memory': 32,
'scheduler/enqueued': 32,
'scheduler/enqueued/memory': 32,
'start_time': datetime.datetime(2018, 12, 15, 17, 19, 36, 819662)}
2018-12-15 18:19:48 [scrapy.core.engine] INFO: Spider closed (finished)
ruby-2.5.1 [chris@t480cia safaribooks]$ ls -al converted/
total 16K
drwxr-xr-x 2 chris chris 4.0K Dec 15 18:19 .
drwxr-xr-x 5 chris chris 4.0K Dec 15 18:19 ..
-rw-r--r-- 1 chris chris 2.7K Dec 15 18:19 Head_First_JavaScript_Programming-9781449340124.epub
I can confirm that the issue is still there.
Hey guys, you can use my fix in #62 to download epub for now.
:(
ruby-2.5.1 [chris@t480cia safaribooks]$ git log -1
commit 1f9ccc9dcf55a74fe4ea4600cea0649311f7f0d8 (HEAD -> pr/62, origin/pr/62)
Author: Hank Bao <[email protected]>
Date: Fri Dec 21 02:11:49 2018 +0800
fix: update host in urls with usage text
ruby-2.5.1 [chris@t480cia safaribooks]$
ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks -c 'BrowserCookie=cf7fba15-bf46-485d-b585-97c91161aca7;SessionID=x80tkjvh1dylp5hhz5xng8wym1yaehfh' -b 9781449340124 download-epub
2018-12-20 21:31:26 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-20 21:31:26 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.9-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-20 21:31:26 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-20 21:31:26 [SafariBooks] INFO: Using `/tmp/tmp28d5rb` as temporary directory
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-20 21:31:26 [scrapy.core.engine] INFO: Spider opened
2018-12-20 21:31:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-20 21:31:26 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-20 21:31:26 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/>
2018-12-20 21:31:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-20 21:31:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: None)
2018-12-20 21:31:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.safaribooksonline.com/home/> from <GET https://www.safaribooksonline.com/home>
2018-12-20 21:31:28 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/home/>
2018-12-20 21:31:28 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:29 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:30 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:30 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:30 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:31 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:31 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:31 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:31 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:31 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:31 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:32 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:32 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:32 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:35 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:35 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:35 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:35 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:35 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:35 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:36 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:36 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:36 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//library/cover/9781449340124/> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:36 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//library/cover/9781449340124/>: HTTP status code is not handled or not allowed
2018-12-20 21:31:36 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-20 21:31:36 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-20 21:31:36 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-20 21:31:36 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 14221,
'downloader/request_count': 32,
'downloader/request_method_count/GET': 32,
'downloader/response_bytes': 214969,
'downloader/response_count': 32,
'downloader/response_status_count/200': 3,
'downloader/response_status_count/301': 1,
'downloader/response_status_count/302': 4,
'downloader/response_status_count/404': 24,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 12, 20, 20, 31, 36, 568840),
'httperror/response_ignored_count': 24,
'httperror/response_ignored_status_count/404': 24,
'log_count/DEBUG': 33,
'log_count/INFO': 34,
'memusage/max': 61202432,
'memusage/startup': 61202432,
'request_depth_max': 3,
'response_received_count': 27,
'scheduler/dequeued': 32,
'scheduler/dequeued/memory': 32,
'scheduler/enqueued': 32,
'scheduler/enqueued/memory': 32,
'start_time': datetime.datetime(2018, 12, 20, 20, 31, 26, 613915)}
2018-12-20 21:31:36 [scrapy.core.engine] INFO: Spider closed (finished)
-rw-r--r-- 1 chris chris 2.7K Dec 20 21:31 Head_First_JavaScript_Programming-9781449340124.epub
@ciapecki You were still using the old version. Need to uninstall the old version first and re-setup my fix.
@hankbao now I uninstalled first but still similar empty file:
ruby-2.5.1 [chris@t480cia safaribooks]$ sudo pip2 uninstall safaribooks
[sudo] password for chris:
Uninstalling safaribooks-0.1.1:
Would remove:
/usr/bin/safaribooks
/usr/lib/python2.7/site-packages/safaribooks-0.1.1-py2.7.egg-info
/usr/lib/python2.7/site-packages/safaribooks/*
Proceed (y/n)? y
Successfully uninstalled safaribooks-0.1.1
ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks
bash: /usr/bin/safaribooks: No such file or directory
then installed and ran:
Successfully installed safaribooks-0.1.1
ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks -c 'BrowserCookie=cf7fba15-bf46-485d-b585-97c91161aca7;SessionID=x80tkjvh1dylp5hhz5xng8wym1yaehfh' -b 9781449340124 download-epub
2018-12-21 08:14:49 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-21 08:14:49 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.9-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-21 08:14:49 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-21 08:14:49 [SafariBooks] INFO: Using `/tmp/tmpKwNTat` as temporary directory
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-21 08:14:49 [scrapy.core.engine] INFO: Spider opened
2018-12-21 08:14:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-21 08:14:49 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-21 08:14:49 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://learning.oreilly.com/>
2018-12-21 08:14:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: None)
2018-12-21 08:14:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://learning.oreilly.com/home/> from <GET https://learning.oreilly.com/home>
2018-12-21 08:14:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://learning.oreilly.com/home/>
2018-12-21 08:14:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-21 08:14:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch11.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch12.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch10.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch08.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch09.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch07.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch06.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch05.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch04.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch03.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch02.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch01.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:56 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr04.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:56 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr05.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:56 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr03.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr02.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/copyright.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/co02.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:58 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ix01.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:58 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/apa.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:58 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch13.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:59 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/dedication.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/library/cover/9781449340124/> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:59 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-21 08:14:59 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-21 08:14:59 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-21 08:14:59 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 16440,
'downloader/request_count': 30,
'downloader/request_method_count/GET': 30,
'downloader/response_bytes': 52402,
'downloader/response_count': 30,
'downloader/response_status_count/200': 4,
'downloader/response_status_count/301': 1,
'downloader/response_status_count/302': 2,
'downloader/response_status_count/401': 23,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 12, 21, 7, 14, 59, 342137),
'httperror/response_ignored_count': 23,
'httperror/response_ignored_status_count/401': 23,
'log_count/DEBUG': 31,
'log_count/INFO': 33,
'memusage/max': 61227008,
'memusage/startup': 61227008,
'request_depth_max': 3,
'response_received_count': 27,
'scheduler/dequeued': 30,
'scheduler/dequeued/memory': 30,
'scheduler/enqueued': 30,
'scheduler/enqueued/memory': 30,
'start_time': datetime.datetime(2018, 12, 21, 7, 14, 49, 131657)}
2018-12-21 08:14:59 [scrapy.core.engine] INFO: Spider closed (finished)
ruby-2.5.1 [chris@t480cia safaribooks]$ ls -al converted/
total 20K
drwxr-xr-x 2 chris chris 4.0K Dec 21 08:14 .
drwxr-xr-x 5 chris chris 4.0K Dec 21 08:14 ..
-rw-r--r-- 1 chris chris 9.4K Dec 21 08:14 Head_First_JavaScript_Programming-9781449340124.epub
The file is bigger than before 9.4kB instead of 2.7kB but it's still content empty.
@ciapecki A lot of errors with code 401 popped. It seems like the authentication credential you provided was invalid.
Can you try downloading your book with username and password?
@hankbao I am logged with company's SSO. We don't have username/password. While I am logged in (I can see and read books) I get the BrowserCookie and SessionID from Chrome Inspect panel (F12). Maybe I am missing some more details from Cookie?
@hankbao I am logged with company's SSO. We don't have username/password. While I am logged in (I can see and read books) I get the BrowserCookie and SessionID from Chrome Inspect panel (F12). Maybe I am missing some more details from Cookie?
I haven't looked into the cookie and session part of the code so I'm not for sure. However, with username and password, I can download my book now. Sometimes there were some 503 errors for some pages but you can always get the whole book by retrying.
Thanks @hankbao It works for me with Docker and my company's SSO
@hankbao I still have the same problem as @sanmibuh, with both docker and normal cli, both user/pass and cookie. Including log from using docker and cookie, but the 401 errors are the same in the other three configurations. Log: https://www.dropbox.com/s/i3xmvcskwgt9yf1/safaribooks.log?dl=0
@hankbao I still have the same problem as @sanmibuh, with both docker and normal cli, both user/pass and cookie. Including log from using docker and cookie, but the 401 errors are the same in the other three configurations. Log: https://www.dropbox.com/s/i3xmvcskwgt9yf1/safaribooks.log?dl=0
If you got 401s with username/password, perhaps your password is indeed incorrect. I'm not familiar with the cookie part of this project. Maybe @sanmibuh could share his experience.
@hankbao Yeah, I thought the same, but it's the exact same one I use to login with. Copied straight out of my password manager. I'm gonna change it and see if that works.
@hankbao Oh, ok. I changed my password, and that didn't work, but then I put it in quotes, and that worked. I use autogenerated passwords with lots of weird characters, so I should have thought of that earlier.
@hankbao or @tofagerl I'm a little lost. I keep getting either:
Traceback (most recent call last):
File "/usr/local/bin/safaribooks", line 11, in
or
docker: Error response from daemon: create $(pwd)/converted: "$(pwd)/converted" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path. See 'docker run --help'.
Thanks.
Hey guys, you can use my fix in #62 to download epub for now.
I can confirm. This works, but i'm not able to open epub
Having the same issue:
2019-01-21 12:11:15 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781457191350/chapter/04-ch1.xhtml>: HTTP status code is not handled or not allowed