comicsrss.com
comicsrss.com copied to clipboard
Add more comic strips
Before you write a scraper for comicsrss, please know that I don't want comicsrss to have some types of comic strips.
I don't want comicsrss to have sexually-suggestive comics. For example, I've considered killing the rss feed for 9 Chickweek Lane, and I still might kill it someday. I'm going to avoid adding anything to comicsrss that's more suggestive than that.
I might kill off political comics. I haven't yet, but I've been strongly considering it for a while now. Internet politics discussions tend to be tribal and echo-chambery, but political comics step that up a few notches.
List of comic strips/websites that folks have requested, and who requested them
Planned:
- [x] Dilbert http://dilbert.com/
- [x] Arcamax https://www.arcamax.com/comics
- [x] Comics Kingdom https://www.comicskingdom.com/
- [ ] Creators.com https://www.creators.com/categories/comics
- Mike P (email) - Spectickles
- [ ] The Far Side https://www.thefarside.com
- Brian W (email)
- Coleman (email)
- [ ] Ctrl+Alt+Delete https://cad-comic.com/feed/
- Milan A (email)
I'd love to see some Comics Kingdom strips added, if possible. (For me, personally, mainly Bizarro, Rhymes with Orange, and Darrin Bell.)
Arcamax has Bizarro, Dilbert, and Rhymes with Orange, and Darrin Bell.
Both Comics Kingdom, and Arcamax look like they will be much more difficult to scrape than gocomics.
Added Dilbert today.
I don't remember why I thought Arcamax would be particularly difficult. It doesn't look like it will be that hard...
<a class="prev" href="/thefunnies/brilliantmindofedisonlee/s-2160999" title="Brilliant Mind of Edison Lee 1/3/2019"><span class="entypo-left-open"></span></a>
<span class="cur">January 4</span>
<a class="next-off" href="#"><span class="entypo-right-open"></span></a>
<!-- ... -->
<figure class="comic">
<img id="comic-zoom" data-zoom-image="/newspics/168/16885/1688589.gif" src="/newspics/168/16885/1688589.gif" data-width="600" data-height="187" alt="" class="img-responsive the-comic" title="click or tap to zoom" />
<cite class="comic-copyright">(c) 2019 John Hambrock. Dist. by King Features Syndicate, Inc.</cite>
</figure>
Hopefully I'll get around to it within a few weeks.
Could I request Sherman's Lagoon and Freefall (the latter is a webcomic found at freefall.purrsia.com)?
Sherman's lagoon is on Comics Kingdom. If/when I add comics Kingdom, I can @ you in this thread.
I doubt I'll add Freefall unless it is part of a larger site like Comics Kingdom or Arcamax. If there's enough demand for it, I might add it.
Or you could look into adding it similar to dilbert was added: https://github.com/ArtskydJ/comicsrss.com/blob/gh-pages/_generator/scraper-dilbert/index.js There isn't really an API for making a scraper... :frowning_face:
This is what I did for dilbert (and the process would be similar on freefall):
- Grab a page that shows multiple comics, including the latest comic a. For dilbert it was https://dilbert.com b. For freefall it might be http://freefall.purrsia.com/lastthree.htm
- Parse the HTML to turn it into an array like this:
[
{
"titleAuthorDate": "Freefall by Tugrik for Wednesday 6/12/2019",
"url": "http://freefall.purrsia.com/ff3300/fc03290.htm",
"date": "2019-06-12",
"comicImageUrl": "http://freefall.purrsia.com/ff3300/fc03290.png"
},
...
]
- Open the cached version of that array, and merge them together. (If I don't have the latest comic in the cached array, then I need to push it onto the array.)
- Write the cached file to disk.
- Integrate it with the rest of the system. (If you do everything else I would be more than happy to integrate your scraper.)
Thanks for the information! I'll look into it and see what I can do!
I made an API and published it in the README.
Any progress on this? I've looked into scraping Comics Kingdom in the past year myself, and it's pretty difficult. Lots of the page gets loaded dynamically when first visited in a web browser. The publishers are clearly trying their best to prevent scraping, but my scraping knowledge is fairly limited when it comes to dynamic data. Maybe the arcamax website would be easier?
@jgbishop Very little progress. You can see in _generator/site-scrapers/ that there are 2 Work In Progress folders. I haven't done anything since then.
Getting a functional scraper is probably around 2-10 hours of work. (Depending on how smoothly it goes, and if you run into any issues, like rate-limiting.) The reason that I haven't made another site scraper is not because of a technical issue blocking the way. It's just I haven't made it a priority.
And I personally don't have a ton of incentive to expand comicsrss since it does all that I need. I still want to scrape more sites.
If you have a specific comic strip that you're wanting, you could try making a scraper just for it, instead of the entire arcamax/comics kingdom site. And that might be a nice starting point for me to expand it to the whole site.
One more thing to note is that if/when arcamax or comics kingdom is added, the site generator will have to avoid making two entries when a comic is in both gocomics.com and the added site.
@jgbishop I finally added Arcamax comics.
Woo-hoo! Thanks! 👏 🍰
Beetle Bailey and Hagar the Horrible, at last!
Well, I may have figured out something for Comics Kingdom: https://jsfiddle.net/p0tojns1/1/
Not a full scraper, and only for Sherman's Lagoon, but it may help.
Interesting...
Earlier, I'd decided not to write a scraper for Comics Kingdom, because I remembered Comics Kingdom being very dynamic. But it looks quite do-able to scrape that site now?
So I'm now planning to write a scraper for Comics Kingdom. I'm not promising anything. 😁 Difficulties might come up where I change my mind again, and abandon Comics Kingdom again. But I hope to get it working!
I would like to suggest https://workchronicles.com
I would like to suggest workchronicles.com
They already have an RSS feed: https://workchronicles.com/feed/
Totally missed it, thanks
On Tue, 16 Nov 2021 at 16:31, Joseph Dykstra @.***> wrote:
I would like to suggest workchronicles.com
They already have an RSS feed: https://workchronicles.com/feed/
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArtskydJ/comicsrss.com/issues/86#issuecomment-970389109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYMV33XI4D6YYNC73DR3LUMJ2MZANCNFSM4FPFSS4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Cant wait for The Far side to be added. Thanks for this awesome resource. :)
Hey, looks like Sherman's Lagoon is now on GoComics so it's being scraped!
I added Comics Kingdom strips to https://www.comicsrss.com/
@infinitytec
Would it be difficult to add support for https://tinyview.com/ and https://www.webtoons.com/ hosted comics?
Webtoons has an RSS feed, but usually only shows the first pane of the comic.
Thanks!
I tried to add additional details for tinyview: #141.
I just updated the original post.
Webtoons has some "mature"-rated comics, which I don't want on comicsrss. The "young adult"-rated comics varied a lot in their suggestiveness. Webtoons, by nature of its user-generated content, is difficult to categorize. If someone wrote a scraper for webtoons, even with the "mature"-rated comics filtered out, I'm not sure if I'd merge it into comicsrss.
I'd probably merge a scraper for tinyview. Most seemed fine. Maybe I'd filter out "Eggs n' Ben", IDK.
Makes sense... For the record, I was interested in some of the family friendly cartoons for each, and I totally respect your desire to keep things clean. (I've sent my teen son to your site to find comics to read.)