ripme icon indicating copy to clipboard operation
ripme copied to clipboard

Chevereto Image Hosting Script Support

Open ghost opened this issue 8 years ago • 24 comments

Hi Guys,

It would be awesome if RipMe could fully support image hosting websites which utilize Chevereto (www.chevereto.com) for their hosting software/script.

Chevereto currently supports JPG, PNG, BMP & GIF

Chevereto does provide a list of all direct image links contained in an album via the "embed codes" tab when viewing a album, however viewing the embed codes it typically limited to logged in users.

As such, it would likely require RipMe to support the ability to authenticate with a website powered by Chevereto in order to grab the links via that method.

An alternative method would be to view the page with all the images listed and then alter the URL grabbing the original media as opposed to the thumbnail.

Example:

Image: https://is02.ezphotoshare.com/2017/01/14/KXUlyU.png Thumbnail: https://is02.ezphotoshare.com/2017/01/14/KXUlyU.th.png Medium Thumbnail: https://is02.ezphotoshare.com/2017/01/14/KXUlyU.md.png

When viewing the content you would typically see the medium size by default, however by changing the URL slightly should be able to grab the actual full size media.

Desired Support:

  • Download Entire Album (Album Link in RipMe)
  • Download All Albums for X User & Save each to their respective folders (User's Album Page in RipMe)

Example:

Album Page: https://www.ezphotoshare.com/album/TDta User's Albums Page: https://www.ezphotoshare.com/doreentokamumzeo/albums

I do own Chevereto and have several sites running, happy to help anyway I am able.

Thanks!

ghost avatar Jan 23 '17 23:01 ghost

Taking a quick look at the links you provided ripping the sites should be easy (all the images have their own class and link) and finding that it is a chevereto site should be easily (the line appears in them)

However atm ripme uses the domain name to tell if it can rip a site, I could bypass this by making a request to the url when the user enters it but I feel that this is poor behavior (No data should be sent or received until the user clicks rip)

A work around might be to use a list of site names now to use chevereto (like ChanRipper.java does) and I feel that this is the best way to do it (for now)

cyian-1756 avatar Jan 24 '17 05:01 cyian-1756

Sounds good!

It would be great if we can start by adding hushpix.com (This would assist in my archiving project allowing users to grab content before I have time to create archives).

ghost avatar Jan 24 '17 06:01 ghost

It seems that Chevereto is forcing redirects on it's image urls (Aside from thumbnails) meaning that when ripme downloads a picture names whatever.png, it really downloads an html page named whatever.png

I'm looking for a work around

cyian-1756 avatar Jan 24 '17 07:01 cyian-1756

It seems that the site is doing some JavaScript stuff to prevent downloading the pictures automatically and has IP banned me, I doubt this is getting added to ripme anytime soon. But I'll get back to working on it once the ip has ended

cyian-1756 avatar Jan 24 '17 07:01 cyian-1756

Thank you for taking the time to investigate!

IP Bans - I am not seeing any showing up, are you able to access the site at this time? Javascript - I am "guessing" that maybe the "Consent Screen" was causing the issue, it has been disabled.

ghost avatar Jan 24 '17 08:01 ghost

IP Bans - I am not seeing any showing up, are you able to access the site at this time?

I can access hushpix.com no problem but I can only access ezphotoshare with JavaScript on from chrome not with firefox or ripme

cyian-1756 avatar Jan 24 '17 08:01 cyian-1756

The ripper seems to work on hushpix could you link me to/make an album on there fore me to test?

cyian-1756 avatar Jan 24 '17 08:01 cyian-1756

Certainly!

User Album Page (List's all the user's public albums) - http://hushpix.com/RedditGWGirls/albums Album Page - http://hushpix.com/album/7BTZ

On the EzPhotoShare front, I am able to access the website without issue using FF, Chrome & Microsoft Edge.

ghost avatar Jan 24 '17 08:01 ghost

On the EzPhotoShare front, I am able to access the website without issue using FF, Chrome & Microsoft Edge.

That's odd, I still can't access it with ripme for some reason

However I have managed to rip from hushpix.com! EzPhotoShare however still won't let me download the images

Give me a bit to polish up my code (it's hacky as all hell atm) and I'll push it to my fork

cyian-1756 avatar Jan 24 '17 08:01 cyian-1756

Excellent to hear!

Once you have it good to go with HushPix I can begin troubleshooting further on EzPhotoShare to see what is causing the failures.

With your implementation will it be possible to point RipMe to the users album page and have it download all the albums to separate directories (This would be amazing!)?

Thank you for all of your hard work!

ghost avatar Jan 24 '17 08:01 ghost

With your implementation will it be possible to point RipMe to the users album page and have it download all the albums to separate directories

Yea it should be possible but it will take a bit of work (I won't get it done today)

Thank you for all of your hard work!

No problem!

cyian-1756 avatar Jan 24 '17 08:01 cyian-1756

I've more or less gotten it working (the album naming is way to long still and you still have to download user albums one at a time) but besides that it now works with hushpix

However ezphotoshare still just downloads the image page

cyian-1756 avatar Jan 24 '17 09:01 cyian-1756

Awesome!

I have tracked down the issue with EzPhotoShare, unfortunately its not something that can be easily resolved at this moment, however it shouldn't effect any other Chevereto powered site.

ghost avatar Jan 24 '17 17:01 ghost

@ihadp Great! I'll keep working on the ripper and it should end up in the main repo around February

cyian-1756 avatar Jan 25 '17 03:01 cyian-1756

@ihadp Sorry it took so long (work and my other projects got in the way) but with my latest commit you can now rip userpages (but all the images one folder atm), I'll work on getting each album in it's own folder in the coming days ~~but I'm not sure it will be possible (I'll read the wiki and see if it has any examples on how to do it)~~ it should be possible, just a bit of a pain

cyian-1756 avatar Feb 02 '17 00:02 cyian-1756

Thank you very much!

ghost avatar Feb 02 '17 15:02 ghost

@ihadp Thanks for reporting and @cyian-1756 thanks for the work implementing.

metaprime avatar Feb 04 '17 12:02 metaprime

Hi Guys,

Any updates on this front?

Thanks!

ghost avatar Feb 21 '17 19:02 ghost

@ihadp I tired to implement 1 album per folder ripper and couldn't. I don't know java or ripmes api well enough to do it.

That being said every thing else is implemented and you could get 1 album per folder by using a simple wrapper script (if you want I can write one)

cyian-1756 avatar Feb 21 '17 19:02 cyian-1756

This is as close as I manged to get

package com.rarchives.ripme.ripper.rippers;

import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import com.rarchives.ripme.ripper.AbstractHTMLRipper;
import com.rarchives.ripme.utils.Http;
import java.util.HashMap;
import java.util.Map;
import java.util.Iterator;


public class CheveretoRipper extends AbstractHTMLRipper {

    public static Map<String, List<String>> albumNameAndUrls = new HashMap<String, List<String>>();
    public static List<String> urlList = new ArrayList<String>();

    public CheveretoRipper(URL url) throws IOException {
    super(url);
    }

    public static List<String> explicit_domains_1 = Arrays.asList("www.ezphotoshare.com", "hushpix.com");
        @Override
        public String getHost() {
            String host = url.toExternalForm();
            return host;
        }

        @Override
        public String getDomain() {
            String host = url.toExternalForm();
            return host;
        }

        @Override
        public boolean canRip(URL url) {
            String url_name = url.toExternalForm();
            if (explicit_domains_1.contains(url_name.split("/")[2]) == true) {
                return true;
            }
            return false;
        }

        @Override
        public String getGID(URL url) throws MalformedURLException {
            Pattern p = Pattern.compile("(?:https?://)?(?:www\\.)?[a-z1-9]*\\.[a-z1-9]*/album/([a-zA-Z1-9]*)/?$");
            Matcher m = p.matcher(url.toExternalForm());
            if (m.matches()) {
                return m.group(1);
            }
            else if (m.matches() == false) {
                Pattern pa = Pattern.compile("(?:https?://)?(?:www\\.)?[a-z1-9]*\\.[a-z1-9]*/([a-zA-Z1-9_-]*)/albums/?$");
                Matcher ma = pa.matcher(url.toExternalForm());
                if (ma.matches()) {
                    return ma.group(1);
                }
            }
            throw new MalformedURLException("Expected chevereto URL format: " +
                            "site.domain/album/albumName or site.domain/username/albums- got " + url + " instead");
        }

        @Override
        public Document getFirstPage() throws IOException {
            // "url" is an instance field of the superclass
            return Http.url(url).get();
        }
        @Override
        public Document getNextPage(Document doc) throws IOException {
            // Find next page
            String nextUrl = "";
            Element elem = doc.select("li.pagination-next > a").first();
                String nextPage = elem.attr("href");
                if (nextUrl == "") {
                    throw new IOException("No more pages");
                }
                // Sleep for half a sec to avoid getting IP banned
                sleep(500);
                return Http.url(nextUrl).get();
            }

        @Override
        public List<String> getURLsFromPage(Document doc) {
            List<String> result = new ArrayList<String>();
            Document userpage_doc;
            // We check for the following string to see if this is a user page or not
            if (doc.toString().contains("content=\"gallery\"")) {
                for (Element elem : doc.select("a.image-container")) {
                    String link = elem.attr("href");
                    logger.info("Grabbing album " + link);
                    try {
                        userpage_doc = Http.url(link).get();
                    } catch(IOException e){
                        logger.warn("Failed to log link in Jsoup");
                        userpage_doc = null;
                        e.printStackTrace();
                    }
                    for (Element element : userpage_doc.select("a.image-container > img")) {
                            String imageSource = element.attr("src");
                            logger.info("Found image " + link);
                            // We remove the .md from images so we download the full size image
                            // not the medium ones
                            imageSource = imageSource.replace(".md", "");
                            result.add(imageSource);
                            urlList.add(imageSource);
                        }
                    for (Element albumNameDoc : userpage_doc.select("meta[property=og:url]")) {
                        String albumName = albumNameDoc.attr("content");
                        albumName = albumName.split("/")[4];
                        albumNameAndUrls.put(albumName, urlList);
                    }
                }
            }
            else {
                for (Element el : doc.select("a.image-container > img")) {
                    String imageSource = el.attr("src");
                    // We remove the .md from images so we download the full size image
                    // not the medium ones
                    imageSource = imageSource.replace(".md", "");
                    result.add(imageSource);
                }
            }
            return result;
        }

        public URL convertUrl(String url) throws MalformedURLException {
            URL urlToDownload;
            try {
                urlToDownload = new URL(url);
            } catch(MalformedURLException e){
                logger.warn("Failed to convert url");
                urlToDownload = null;
                e.printStackTrace();
            }

                return urlToDownload;
        }

        @Override
        public void downloadURL(URL url, int index) {
            logger.info(url);
            for (Map.Entry<String, List<String>> entry : albumNameAndUrls.entrySet()) {
                String key = entry.getKey();
                List<String> values = entry.getValue();
                for (String urlToConvert : values) {
                    try {
                        logger.info("Downloading " + urlToConvert);
                        logger.info(convertUrl(urlToConvert));
                        addURLToDownload(convertUrl(urlToConvert));
                    }
                    catch(MalformedURLException e){
                        e.printStackTrace();
                    }
                }
            }
        }
    }

It almost works but for some reason it fails to download anything.

The only out of the script is

Downloading https://i.hushpix.com/j4P1N.jpg
https://i.hushpix.com/j4P1N.jpg
url: https://i.hushpix.com/j4P1N.jpg, prefix: , subdirectory, referrer: null, cookies: null
Downloading https://i.hushpix.com/j4P1N.jpg to /home/USER/ripme/rips/https__hushpix.com_RedditGWGirls_albums_RedditGWGirls/j4P1N.jpg
[!] Skipping https://i.hushpix.com/j4P1N.jpg -- already attempted: ./rips/https__hushpix.com_RedditGWGirls_albums_RedditGWGirls/j4P1N.jpg

But confusingly it does sometimes download the image.

If anyone else wants to take a crack at it that would be great

cyian-1756 avatar Feb 22 '17 01:02 cyian-1756

@cyian-1756 This was part of your PR before, right? In that case, are you planning to hold off on submitting the PR for this ripper now?

metaprime avatar Feb 22 '17 06:02 metaprime

@metaprime

This was part of your PR before, right?

Kinda. This is a heavily modified/rewritten version of it

In that case, are you planning to hold off on submitting the PR for this ripper now?

I'm planning on making a few changes to the original ripper (naming out put ect), removing the broken user page downloading and making a pull for it

cyian-1756 avatar Feb 22 '17 12:02 cyian-1756

Hi Everyone,

Following up on this issue to see where we stand.

I launched a new Chevereto based website a few months back (https://gwarchives.com) and would love it users could easily download the albums of their choice (or entire account) using RipMe.

Let me know if I can be of any assistance.

ghost avatar Jan 06 '18 13:01 ghost

@ihadp

I launched a new Chevereto based website a few months back (https://gwarchives.com)

I'll add it to the CheveretoRipper

and would love it users could easily download the albums of their choice (or entire account) using RipMe.

The CheveretoRipper has album support at this point, I still have to add account support

Also this is the old and unmaintained repo, the new repo is https://github.com/RipMeApp/ripme

cyian-1756 avatar Jan 06 '18 17:01 cyian-1756