comicsrss.com icon indicating copy to clipboard operation
comicsrss.com copied to clipboard

Add more comic strips

Open ArtskydJ opened this issue 5 years ago • 25 comments

Before you write a scraper for comicsrss, please know that I don't want comicsrss to have some types of comic strips.

I don't want comicsrss to have sexually-suggestive comics. For example, I've considered killing the rss feed for 9 Chickweek Lane, and I still might kill it someday. I'm going to avoid adding anything to comicsrss that's more suggestive than that.

I might kill off political comics. I haven't yet, but I've been strongly considering it for a while now. Internet politics discussions tend to be tribal and echo-chambery, but political comics step that up a few notches.


List of comic strips/websites that folks have requested, and who requested them

Planned:

  • [x] Dilbert http://dilbert.com/
  • [x] Arcamax https://www.arcamax.com/comics
  • [x] Comics Kingdom https://www.comicskingdom.com/
  • [ ] Creators.com https://www.creators.com/categories/comics
    • Mike P (email) - Spectickles
  • [ ] The Far Side https://www.thefarside.com
    • Brian W (email)
    • Coleman (email)
  • [ ] Ctrl+Alt+Delete https://cad-comic.com/feed/
    • Milan A (email)

ArtskydJ avatar Aug 12 '18 03:08 ArtskydJ

I'd love to see some Comics Kingdom strips added, if possible. (For me, personally, mainly Bizarro, Rhymes with Orange, and Darrin Bell.)

ghost avatar Nov 15 '18 09:11 ghost

Arcamax has Bizarro, Dilbert, and Rhymes with Orange, and Darrin Bell.

Both Comics Kingdom, and Arcamax look like they will be much more difficult to scrape than gocomics.

ArtskydJ avatar Nov 24 '18 20:11 ArtskydJ

Added Dilbert today.

ArtskydJ avatar Nov 24 '18 20:11 ArtskydJ

I don't remember why I thought Arcamax would be particularly difficult. It doesn't look like it will be that hard...

<a class="prev" href="/thefunnies/brilliantmindofedisonlee/s-2160999" title="Brilliant Mind of Edison Lee 1/3/2019"><span class="entypo-left-open"></span></a>
  <span class="cur">January  4</span>
<a class="next-off" href="#"><span class="entypo-right-open"></span></a>

<!-- ... -->

<figure class="comic">
  <img id="comic-zoom" data-zoom-image="/newspics/168/16885/1688589.gif" src="/newspics/168/16885/1688589.gif"  data-width="600" data-height="187" alt="" class="img-responsive the-comic" title="click or tap to zoom" />
  <cite class="comic-copyright">(c) 2019 John Hambrock.  Dist. by King Features Syndicate, Inc.</cite>
</figure>

Hopefully I'll get around to it within a few weeks.

ArtskydJ avatar Jan 04 '19 15:01 ArtskydJ

Could I request Sherman's Lagoon and Freefall (the latter is a webcomic found at freefall.purrsia.com)?

infinitytec avatar Jun 12 '19 13:06 infinitytec

Sherman's lagoon is on Comics Kingdom. If/when I add comics Kingdom, I can @ you in this thread.

I doubt I'll add Freefall unless it is part of a larger site like Comics Kingdom or Arcamax. If there's enough demand for it, I might add it.

Or you could look into adding it similar to dilbert was added: https://github.com/ArtskydJ/comicsrss.com/blob/gh-pages/_generator/scraper-dilbert/index.js There isn't really an API for making a scraper... :frowning_face:


This is what I did for dilbert (and the process would be similar on freefall):

  1. Grab a page that shows multiple comics, including the latest comic a. For dilbert it was https://dilbert.com b. For freefall it might be http://freefall.purrsia.com/lastthree.htm
  2. Parse the HTML to turn it into an array like this:
[
    {
        "titleAuthorDate": "Freefall by Tugrik for Wednesday 6/12/2019",
        "url": "http://freefall.purrsia.com/ff3300/fc03290.htm",
        "date": "2019-06-12",
        "comicImageUrl": "http://freefall.purrsia.com/ff3300/fc03290.png"
    },
    ...
]
  1. Open the cached version of that array, and merge them together. (If I don't have the latest comic in the cached array, then I need to push it onto the array.)
  2. Write the cached file to disk.
  3. Integrate it with the rest of the system. (If you do everything else I would be more than happy to integrate your scraper.)

ArtskydJ avatar Jun 12 '19 14:06 ArtskydJ

Thanks for the information! I'll look into it and see what I can do!

infinitytec avatar Jun 12 '19 15:06 infinitytec

I made an API and published it in the README.

ArtskydJ avatar Jun 24 '19 13:06 ArtskydJ

Any progress on this? I've looked into scraping Comics Kingdom in the past year myself, and it's pretty difficult. Lots of the page gets loaded dynamically when first visited in a web browser. The publishers are clearly trying their best to prevent scraping, but my scraping knowledge is fairly limited when it comes to dynamic data. Maybe the arcamax website would be easier?

jgbishop avatar Jan 01 '20 14:01 jgbishop

@jgbishop Very little progress. You can see in _generator/site-scrapers/ that there are 2 Work In Progress folders. I haven't done anything since then.

Getting a functional scraper is probably around 2-10 hours of work. (Depending on how smoothly it goes, and if you run into any issues, like rate-limiting.) The reason that I haven't made another site scraper is not because of a technical issue blocking the way. It's just I haven't made it a priority.

And I personally don't have a ton of incentive to expand comicsrss since it does all that I need. I still want to scrape more sites.

If you have a specific comic strip that you're wanting, you could try making a scraper just for it, instead of the entire arcamax/comics kingdom site. And that might be a nice starting point for me to expand it to the whole site.

One more thing to note is that if/when arcamax or comics kingdom is added, the site generator will have to avoid making two entries when a comic is in both gocomics.com and the added site.

ArtskydJ avatar Jan 08 '20 05:01 ArtskydJ

@jgbishop I finally added Arcamax comics.

ArtskydJ avatar Jun 23 '20 14:06 ArtskydJ

Woo-hoo! Thanks! 👏 🍰

jgbishop avatar Jun 23 '20 15:06 jgbishop

Beetle Bailey and Hagar the Horrible, at last!

ghost avatar Jul 05 '20 12:07 ghost

Well, I may have figured out something for Comics Kingdom: https://jsfiddle.net/p0tojns1/1/

Not a full scraper, and only for Sherman's Lagoon, but it may help.

infinitytec avatar Oct 14 '21 12:10 infinitytec

Interesting...

Earlier, I'd decided not to write a scraper for Comics Kingdom, because I remembered Comics Kingdom being very dynamic. But it looks quite do-able to scrape that site now?

So I'm now planning to write a scraper for Comics Kingdom. I'm not promising anything. 😁 Difficulties might come up where I change my mind again, and abandon Comics Kingdom again. But I hope to get it working!

ArtskydJ avatar Oct 14 '21 16:10 ArtskydJ

I would like to suggest https://workchronicles.com

jalberto avatar Nov 15 '21 12:11 jalberto

I would like to suggest workchronicles.com

They already have an RSS feed: https://workchronicles.com/feed/

ArtskydJ avatar Nov 16 '21 15:11 ArtskydJ

Totally missed it, thanks

On Tue, 16 Nov 2021 at 16:31, Joseph Dykstra @.***> wrote:

I would like to suggest workchronicles.com

They already have an RSS feed: https://workchronicles.com/feed/

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArtskydJ/comicsrss.com/issues/86#issuecomment-970389109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYMV33XI4D6YYNC73DR3LUMJ2MZANCNFSM4FPFSS4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jalberto avatar Nov 16 '21 15:11 jalberto

Cant wait for The Far side to be added. Thanks for this awesome resource. :)

twizzayy avatar Apr 07 '22 20:04 twizzayy

Hey, looks like Sherman's Lagoon is now on GoComics so it's being scraped!

infinitytec avatar Jul 02 '22 13:07 infinitytec

I added Comics Kingdom strips to https://www.comicsrss.com/

@infinitytec

ArtskydJ avatar Aug 09 '22 18:08 ArtskydJ

Would it be difficult to add support for https://tinyview.com/ and https://www.webtoons.com/ hosted comics?

Webtoons has an RSS feed, but usually only shows the first pane of the comic.

Thanks!

tylerbenson avatar May 25 '23 16:05 tylerbenson

I tried to add additional details for tinyview: #141.

tylerbenson avatar Oct 27 '23 03:10 tylerbenson

I just updated the original post.

Webtoons has some "mature"-rated comics, which I don't want on comicsrss. The "young adult"-rated comics varied a lot in their suggestiveness. Webtoons, by nature of its user-generated content, is difficult to categorize. If someone wrote a scraper for webtoons, even with the "mature"-rated comics filtered out, I'm not sure if I'd merge it into comicsrss.

I'd probably merge a scraper for tinyview. Most seemed fine. Maybe I'd filter out "Eggs n' Ben", IDK.

ArtskydJ avatar Oct 27 '23 15:10 ArtskydJ

Makes sense... For the record, I was interested in some of the family friendly cartoons for each, and I totally respect your desire to keep things clean. (I've sent my teen son to your site to find comics to read.)

tylerbenson avatar Oct 27 '23 20:10 tylerbenson