ripme
ripme copied to clipboard
Chevereto Image Hosting Script Support
Hi Guys,
It would be awesome if RipMe could fully support image hosting websites which utilize Chevereto (www.chevereto.com) for their hosting software/script.
Chevereto currently supports JPG, PNG, BMP & GIF
Chevereto does provide a list of all direct image links contained in an album via the "embed codes" tab when viewing a album, however viewing the embed codes it typically limited to logged in users.
As such, it would likely require RipMe to support the ability to authenticate with a website powered by Chevereto in order to grab the links via that method.
An alternative method would be to view the page with all the images listed and then alter the URL grabbing the original media as opposed to the thumbnail.
Example:
Image: https://is02.ezphotoshare.com/2017/01/14/KXUlyU.png Thumbnail: https://is02.ezphotoshare.com/2017/01/14/KXUlyU.th.png Medium Thumbnail: https://is02.ezphotoshare.com/2017/01/14/KXUlyU.md.png
When viewing the content you would typically see the medium size by default, however by changing the URL slightly should be able to grab the actual full size media.
Desired Support:
- Download Entire Album (Album Link in RipMe)
- Download All Albums for X User & Save each to their respective folders (User's Album Page in RipMe)
Example:
Album Page: https://www.ezphotoshare.com/album/TDta User's Albums Page: https://www.ezphotoshare.com/doreentokamumzeo/albums
I do own Chevereto and have several sites running, happy to help anyway I am able.
Thanks!
Taking a quick look at the links you provided ripping the sites should be easy (all the images have their own class and link) and finding that it is a chevereto site should be easily (the line appears in them)
However atm ripme uses the domain name to tell if it can rip a site, I could bypass this by making a request to the url when the user enters it but I feel that this is poor behavior (No data should be sent or received until the user clicks rip)
A work around might be to use a list of site names now to use chevereto (like ChanRipper.java
does) and I feel that this is the best way to do it (for now)
Sounds good!
It would be great if we can start by adding hushpix.com (This would assist in my archiving project allowing users to grab content before I have time to create archives).
It seems that Chevereto is forcing redirects on it's image urls (Aside from thumbnails) meaning that when ripme downloads a picture names whatever.png, it really downloads an html page named whatever.png
I'm looking for a work around
It seems that the site is doing some JavaScript stuff to prevent downloading the pictures automatically and has IP banned me, I doubt this is getting added to ripme anytime soon. But I'll get back to working on it once the ip has ended
Thank you for taking the time to investigate!
IP Bans - I am not seeing any showing up, are you able to access the site at this time? Javascript - I am "guessing" that maybe the "Consent Screen" was causing the issue, it has been disabled.
IP Bans - I am not seeing any showing up, are you able to access the site at this time?
I can access hushpix.com no problem but I can only access ezphotoshare with JavaScript on from chrome not with firefox or ripme
The ripper seems to work on hushpix could you link me to/make an album on there fore me to test?
Certainly!
User Album Page (List's all the user's public albums) - http://hushpix.com/RedditGWGirls/albums Album Page - http://hushpix.com/album/7BTZ
On the EzPhotoShare front, I am able to access the website without issue using FF, Chrome & Microsoft Edge.
On the EzPhotoShare front, I am able to access the website without issue using FF, Chrome & Microsoft Edge.
That's odd, I still can't access it with ripme for some reason
However I have managed to rip from hushpix.com! EzPhotoShare however still won't let me download the images
Give me a bit to polish up my code (it's hacky as all hell atm) and I'll push it to my fork
Excellent to hear!
Once you have it good to go with HushPix I can begin troubleshooting further on EzPhotoShare to see what is causing the failures.
With your implementation will it be possible to point RipMe to the users album page and have it download all the albums to separate directories (This would be amazing!)?
Thank you for all of your hard work!
With your implementation will it be possible to point RipMe to the users album page and have it download all the albums to separate directories
Yea it should be possible but it will take a bit of work (I won't get it done today)
Thank you for all of your hard work!
No problem!
I've more or less gotten it working (the album naming is way to long still and you still have to download user albums one at a time) but besides that it now works with hushpix
However ezphotoshare still just downloads the image page
Awesome!
I have tracked down the issue with EzPhotoShare, unfortunately its not something that can be easily resolved at this moment, however it shouldn't effect any other Chevereto powered site.
@ihadp Great! I'll keep working on the ripper and it should end up in the main repo around February
@ihadp Sorry it took so long (work and my other projects got in the way) but with my latest commit you can now rip userpages (but all the images one folder atm), I'll work on getting each album in it's own folder in the coming days ~~but I'm not sure it will be possible (I'll read the wiki and see if it has any examples on how to do it)~~ it should be possible, just a bit of a pain
Thank you very much!
@ihadp Thanks for reporting and @cyian-1756 thanks for the work implementing.
Hi Guys,
Any updates on this front?
Thanks!
@ihadp I tired to implement 1 album per folder ripper and couldn't. I don't know java or ripmes api well enough to do it.
That being said every thing else is implemented and you could get 1 album per folder by using a simple wrapper script (if you want I can write one)
This is as close as I manged to get
package com.rarchives.ripme.ripper.rippers;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import com.rarchives.ripme.ripper.AbstractHTMLRipper;
import com.rarchives.ripme.utils.Http;
import java.util.HashMap;
import java.util.Map;
import java.util.Iterator;
public class CheveretoRipper extends AbstractHTMLRipper {
public static Map<String, List<String>> albumNameAndUrls = new HashMap<String, List<String>>();
public static List<String> urlList = new ArrayList<String>();
public CheveretoRipper(URL url) throws IOException {
super(url);
}
public static List<String> explicit_domains_1 = Arrays.asList("www.ezphotoshare.com", "hushpix.com");
@Override
public String getHost() {
String host = url.toExternalForm();
return host;
}
@Override
public String getDomain() {
String host = url.toExternalForm();
return host;
}
@Override
public boolean canRip(URL url) {
String url_name = url.toExternalForm();
if (explicit_domains_1.contains(url_name.split("/")[2]) == true) {
return true;
}
return false;
}
@Override
public String getGID(URL url) throws MalformedURLException {
Pattern p = Pattern.compile("(?:https?://)?(?:www\\.)?[a-z1-9]*\\.[a-z1-9]*/album/([a-zA-Z1-9]*)/?$");
Matcher m = p.matcher(url.toExternalForm());
if (m.matches()) {
return m.group(1);
}
else if (m.matches() == false) {
Pattern pa = Pattern.compile("(?:https?://)?(?:www\\.)?[a-z1-9]*\\.[a-z1-9]*/([a-zA-Z1-9_-]*)/albums/?$");
Matcher ma = pa.matcher(url.toExternalForm());
if (ma.matches()) {
return ma.group(1);
}
}
throw new MalformedURLException("Expected chevereto URL format: " +
"site.domain/album/albumName or site.domain/username/albums- got " + url + " instead");
}
@Override
public Document getFirstPage() throws IOException {
// "url" is an instance field of the superclass
return Http.url(url).get();
}
@Override
public Document getNextPage(Document doc) throws IOException {
// Find next page
String nextUrl = "";
Element elem = doc.select("li.pagination-next > a").first();
String nextPage = elem.attr("href");
if (nextUrl == "") {
throw new IOException("No more pages");
}
// Sleep for half a sec to avoid getting IP banned
sleep(500);
return Http.url(nextUrl).get();
}
@Override
public List<String> getURLsFromPage(Document doc) {
List<String> result = new ArrayList<String>();
Document userpage_doc;
// We check for the following string to see if this is a user page or not
if (doc.toString().contains("content=\"gallery\"")) {
for (Element elem : doc.select("a.image-container")) {
String link = elem.attr("href");
logger.info("Grabbing album " + link);
try {
userpage_doc = Http.url(link).get();
} catch(IOException e){
logger.warn("Failed to log link in Jsoup");
userpage_doc = null;
e.printStackTrace();
}
for (Element element : userpage_doc.select("a.image-container > img")) {
String imageSource = element.attr("src");
logger.info("Found image " + link);
// We remove the .md from images so we download the full size image
// not the medium ones
imageSource = imageSource.replace(".md", "");
result.add(imageSource);
urlList.add(imageSource);
}
for (Element albumNameDoc : userpage_doc.select("meta[property=og:url]")) {
String albumName = albumNameDoc.attr("content");
albumName = albumName.split("/")[4];
albumNameAndUrls.put(albumName, urlList);
}
}
}
else {
for (Element el : doc.select("a.image-container > img")) {
String imageSource = el.attr("src");
// We remove the .md from images so we download the full size image
// not the medium ones
imageSource = imageSource.replace(".md", "");
result.add(imageSource);
}
}
return result;
}
public URL convertUrl(String url) throws MalformedURLException {
URL urlToDownload;
try {
urlToDownload = new URL(url);
} catch(MalformedURLException e){
logger.warn("Failed to convert url");
urlToDownload = null;
e.printStackTrace();
}
return urlToDownload;
}
@Override
public void downloadURL(URL url, int index) {
logger.info(url);
for (Map.Entry<String, List<String>> entry : albumNameAndUrls.entrySet()) {
String key = entry.getKey();
List<String> values = entry.getValue();
for (String urlToConvert : values) {
try {
logger.info("Downloading " + urlToConvert);
logger.info(convertUrl(urlToConvert));
addURLToDownload(convertUrl(urlToConvert));
}
catch(MalformedURLException e){
e.printStackTrace();
}
}
}
}
}
It almost works but for some reason it fails to download anything.
The only out of the script is
Downloading https://i.hushpix.com/j4P1N.jpg
https://i.hushpix.com/j4P1N.jpg
url: https://i.hushpix.com/j4P1N.jpg, prefix: , subdirectory, referrer: null, cookies: null
Downloading https://i.hushpix.com/j4P1N.jpg to /home/USER/ripme/rips/https__hushpix.com_RedditGWGirls_albums_RedditGWGirls/j4P1N.jpg
[!] Skipping https://i.hushpix.com/j4P1N.jpg -- already attempted: ./rips/https__hushpix.com_RedditGWGirls_albums_RedditGWGirls/j4P1N.jpg
But confusingly it does sometimes download the image.
If anyone else wants to take a crack at it that would be great
@cyian-1756 This was part of your PR before, right? In that case, are you planning to hold off on submitting the PR for this ripper now?
@metaprime
This was part of your PR before, right?
Kinda. This is a heavily modified/rewritten version of it
In that case, are you planning to hold off on submitting the PR for this ripper now?
I'm planning on making a few changes to the original ripper (naming out put ect), removing the broken user page downloading and making a pull for it
Hi Everyone,
Following up on this issue to see where we stand.
I launched a new Chevereto based website a few months back (https://gwarchives.com) and would love it users could easily download the albums of their choice (or entire account) using RipMe.
Let me know if I can be of any assistance.
@ihadp
I launched a new Chevereto based website a few months back (https://gwarchives.com)
I'll add it to the CheveretoRipper
and would love it users could easily download the albums of their choice (or entire account) using RipMe.
The CheveretoRipper has album support at this point, I still have to add account support
Also this is the old and unmaintained repo, the new repo is https://github.com/RipMeApp/ripme