MediaElch
MediaElch copied to clipboard
Scraper for fernsehserien.de
Hello,
Do you think it's possible to add a scraper for fernsehserien.de to MediaElch?
Michael
Hi,
yes, that should be possible. But I have other topics that have a higher priority and then I have a few university projects that I need to take care of. So I won't be able to add this in the next ~3 months.
Do you happen to know how often their website (design) changes?
I thought i give this a little push .... A scraper for fernsehserien.de would be awesome. There are so many shows not listet properly on TMDB and others.
It's been a long time. I plan this for the next version.
Notes to self:
- Search URL: https://www.fernsehserien.de/suche/simpsons
- TV Show URL: https://www.fernsehserien.de/die-simpsons
- Episode URL: https://www.fernsehserien.de/die-simpsons/folgen/2x01-der-musterschueler-62047
- Season URL: https://www.fernsehserien.de/die-simpsons/episodenguide/staffel-2/6882 (ID is relevant)
- Overview URL: https://www.fernsehserien.de/die-simpsons/episodenguide
Episode Details:
- Runtime
- Title
- Original Title
- Season No. / Episode No.
- German Date / Original Date ("Premiere")
- Overview
- Cast / Crew (+ images)
- no thumbnail
- no rating
Season Details:
- episodes
- no season image
TV Show Details:
- year
- country
- overview
- actors
- genres
- no rating
- no images
See also:
- https://github.com/ermshiperete/scraper-fernsehserien/blob/master/src/main/java/org/tinymediamanager/scraper/fernsehserien/FernsehserienTvShowParser.java
- https://github.com/ermshiperete/scraper-fernsehserien/blob/master/src/main/java/org/tinymediamanager/scraper/fernsehserien/FernsehserienMetadataProvider.java
There is also "fast search". We need to remove non-breaking spaces.
TODO
- [x] base classes
- [x] meta data ("Datenschutz URL", ..)
- [x] episode scraper
- [x] season scraper
- [x] TV show scraper
- [x] automated tests
- [ ] manual test (UI integration; custom TV show scraper?)
That's great news! Thank you so much for you're time and dedication you put into this awesome tool!
Further notes: Auto-redirection from https://www.fernsehserien.de/suche/Scrubs
to http://www.fernsehserien.de/scrubs
(without https!):
❯ wget https://www.fernsehserien.de/suche/Scrubs
--2023-05-30 21:07:34-- https://www.fernsehserien.de/suche/Scrubs
Resolving www.fernsehserien.de (www.fernsehserien.de)... 54.93.170.188, 3.69.236.91, 52.58.22.94
Connecting to www.fernsehserien.de (www.fernsehserien.de)|54.93.170.188|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://www.fernsehserien.de/scrubs [following]
--2023-05-30 21:07:35-- http://www.fernsehserien.de/scrubs
Connecting to www.fernsehserien.de (www.fernsehserien.de)|54.93.170.188|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.fernsehserien.de:443/scrubs [following]
--2023-05-30 21:07:35-- https://www.fernsehserien.de/scrubs
Connecting to www.fernsehserien.de (www.fernsehserien.de)|54.93.170.188|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘Scrubs’
Notes to self:
Searching for "Scrubs" needs more work. But that should be easy to do once the TV show page scraping works.
Search works so far:
Basic TV show loading works as well. More details to come, but I only wanted to test that it works at all (see preview window):
@TotoCon Are you familiar with fernsehserien.de? Do you happen to know some TV shows that are out of the ordinary / special? I'm looking for edge cases that I could be missing. For example:
- TV show with only one sentence as overview or no overview at all
- TV show with special characters (non ASCII, e.g. äöüß)
- TV show, whose page looks different than others, e.g. Simpsons
I've tested it with "Die Simpsons" and "Scrubs".
A special case I found is, for example, that searching for "Scrubs" redirects to the TV show page. I found that by accident, but if I hadn't, searching for "Scrubs" would have returned 0 results. That's why I want to make sure that I'm handling "special" TV shows.
There is still a lot to do. I mostly work an hour or so on MediaElch once or twice a week. So I can't give an estimate on when this will be implemented.
If you want, you can also answer in German.
Greeting from Heidelberg, Andre
Everything is working so far (episode/season/TV show scraping). Currently, I only load title and overview, but the basic structure is working. I have tests for:
- Simpsons: my default choice for TV show scraping and episode scraping (via "id"/URL and episode/season number)
- Scrubs: Search redirects to TV show page
- Black Mirror: I test the "load all seasons" and "load specific season" feature
I've opened a pull request where you can track the state of this feature: #1578
Hi Andre, i'm not that gamiliar with fernsehserien.de. In the past i just copied the data from there to create my own .nfo files. But only for german shows like Löwenzahn, Checker Tobi etc. For foreign shows i use Mediaelchs build in scrapers.
Meh, just posted a second after you. :) Thank you for keeping me and all other fans of you're tool up to date. There is no need to hurry btw. :)
Got it. In that case I'll look for some edge cases myself. :-)
What data did you copy from there? Just to know which details I must not forget to include.
Edit:// I'm not in a hurry, but had time today. :smile:
Well, i took the episodes description and name from there. Just to have something displayed for my kids, so they can find their favs. Nothing special. But if you could also get the cast and crew, that would be nice.
If you're up for some betatesting, let us know. I am sure that a lot of people would love to test the efforts you put in it.
The scraper is implemented and works. I found a few things that are not-so-nice in MediaElch, such as a missing progress bar when scraping all episodes of a TV show. Currently, it seems as if MediaElch never finishes. Also, in #1580 I reported that actor thumbnails are missing. fernsehserien.de provides them, but MediaElch does not download them, yet.
The next Nightly version (~1h) will contain "fernsehserien.de". Feel free to test and report issues.
Regards, Andre
Hi Andre, thank's a lot for this! Looks like i'll have some mediaelchfun on saturday/sunday. :)
Snapshot versions for macOS and Windows are now available: https://mediaelch-downloads.ameyering.de/snapshots/
Linux AppImage snapshots will come later.
If you are still looking for edge cases I think I found one:
It doesn't seem possible to scrape shows that use years instead of seasons, like e.g. Erlebnis Erde. If I remove the season (like described in the Kodi wiki) ME doesn't pick up the episodes and setting it to S00, S01, or S2023 doesn't scrape anything from fernsehserien.de :thinking:
It seems like shows organized by year (documentaries, soaps, …) don't get assigned any season on their side at all, the url format is completely different.
Thanks!
Moved the new issue to #1637
I'll work on it after my vacation (~ 2 weeks).