leech
leech copied to clipboard
Fix for Issue #2
EDIT: Fix for Issue #2 Here's a snippet of the new README
Images support
Leech creates EPUB 2.01 files, which means that Leech can only save images in the following format:
- JPEG (JPG/JFIF)
- PNG
- GIF
See the Open Publication Structure (OPS) 2.0.1 for more information.
Leech can not save images in SVG because it is not supported by Pillow.
Leech uses Pillow for image manipulation and conversion. If you want to use a different image format, you can install the required dependencies for Pillow and you will probably have to tinker with Leech. See the Pillow documentation for more information.
By default, Leech will try and save all non-animated images as JPEG. The only animated images that Leech will save are GIFs.
To configure image support, you will need to create a file called leech.json
. See the section below for more information.
Configuration
A very small amount of configuration is possible by creating a file called leech.json
in the project directory. Currently you can define login information for sites that support it, and some options for book covers.
Example:
{
"logins": {
"QuestionableQuesting": ["username", "password"]
},
"images": true,
"image_format": "png",
"compress_images": true,
"max_image_size": 100000,
"cover": {
"fontname": "Comic Sans MS",
"fontsize": 30,
"bgcolor": [20, 120, 20],
"textcolor": [180, 20, 180],
"cover_url": "https://website.com/image.png"
},
"output_dir": "/tmp/ebooks",
"site_options": {
"RoyalRoad": {
"output_dir": "/tmp/litrpg_isekai_trash"
}
}
}
Note: The
images
key is a boolean and can only betrue
orfalse
. Booleans in JSON are written in lowercase. If it isfalse
, Leech will not download any images. Leech will also ignore theimage_format
key ifimages
isfalse
.
Note: If the
image_format
key does not exist, Leech will default tojpeg
. The three image formats arejpeg
,png
, andgif
. Theimage_format
key is case-insensitive.
Note: The
compress_images
key tells Leech to compress images. This is only supported forjpeg
andpng
images. This also goes hand-in-hand with themax_image_size
key. If thecompress_images
key istrue
but there's nomax_image_size
key, Leech will compress the image to a size less than 1MB (1000000 bytes). If themax_image_size
key is present, Leech will compress the image to a size less than the value of themax_image_size
key. Themax_image_size
key is in bytes. Ifcompress_images
isfalse
, Leech will ignore themax_image_size
key.
Warning: Compressing images might make Leech take a lot longer to download images.
Warning: Compressing images might make the image quality worse.
Warning:
max_image_size
is not a hard limit. Leech will try to compress the image to the size of themax_image_size
key, but Leech might not be able to compress the image to the exact size of themax_image_size
key.
Warning:
max_image_size
should not be too small. For instance, if you setmax_image_size
to 1000, Leech will probably not be able to compress the image to 1000 bytes. If you setmax_image_size
to 1000000, Leech will probably be able to compress the image to 1000000 bytes.
Warning: Leech will not compress GIFs, that might damage the animation.
Old:
Partial Fix for Issue #2
Thanks to @IdanDor for this pull request.
Specifically, added image_selector for arbitrary sites that allows selecting img tags from chapters, downloading them and embedding them within the resulting epub. In the case of Pale, this means that the character banners and extra materials do not require an internet connection to view. Also made the two pale.json's more consistent (pale.json now correctly includes the title of the chapters). https://github.com/kemayo/leech/pull/84#issue-1436128961
This doesn't work for other sites (like fiction.live) so I did this:
else:
soup = BeautifulSoup(chapter.contents, 'html5lib')
for count, img in enumerate(soup.find_all('img')):
img_contents = get_image_from_url(img['src']).read()
chapter.images.append(Image(
path=f"images/ch{i}_leechimage_{count}.png",
contents=img_contents,
content_type='image/png'
))
img['src'] = f"../images/ch{i}_leechimage_{count}.png"
if not img.has_attr('alt'):
img['alt'] = f"Image {count} from chapter {i}"
It builds up on @IdanDor code as well since it adds all the images it can find to the chapter.images
list:
# Add all pictures on this chapter as well.
for image in chapter.images:
# For/else syntax, check if the image path already exists, if it doesn't add the image.
# Duplicates are not allowed in the format.
for other_file in chapters:
if other_file.path == image.path:
break
else:
chapters.append(EpubFile(path=image.path, contents=image.contents, filetype=image.content_type))
I only tested this with stories from fiction.live but they've all worked fine. I also ran the epubs made through epubcheck and there were no fatals only minor errors.
Just like you wrote in the linked issue, I thought it should something one can somehow disable. And the selector simply matches in my mind what the codebase does with every other "choice". https://github.com/kemayo/leech/pull/84#issuecomment-1318061676
I would not even know where to start with making images an option which is why I called this a partial fix
Ah there's a problem with this The png format is huge so in a story with many images, you can end up with a massive epub file. So maybe some image compression is needed as well? And conversion to jpg/jpeg which is a lot smaller?
EDIT: No really, I accidentally downloaded a story that was 1.5 GB in size so be careful :laughing:
Turns out on fiction.live, you can have an empty image tag. Just <img />
no src. Crazy!
I tested this out on the Wandering Inn (https://wanderinginn.com/table-of-contents/), which has images in, for example, the Cover page, title page, and chapter 1.02, but it fails to detect all but one of the images. I imagine this has to do with how images are embedded in the HTML on this Wordpress site, but I'm still parsing it out.
Edit: Ah, the issue was I was my json filter selector was preventing them from being read. All of the pictures are detected now, but most fail to load for some reason. It may be an issue with my ebook viewer, however (Calibre).
Edit2: Confirmed the images show up on my Kindle, so it's a Calibre issue that they are broken in their e-book viewer. Thank you and @IdanDor for your work on adding this feature!
There's this weird image-hosting site called filepicker.io that's causing problems when you try to download from it. This new commit should fix it. The fix: https://github.com/JimmXinu/FanFicFare/issues/933#issuecomment-1483848726
These new updates work for me but I only tested them on one site (fiction.live)
This code doesn't download images in xenforo spoilers yet. This will be fixed soon.
EDIT: These xenforo spoiler images are weird. The images get downloaded twice for some reason.