dailyblink Daily Blink Page Layout has changed - IndexError: list index out of range

The Layout and URL of the Free Daily Page has changed.

New URL: https://www.blinkist.com/en/content/daily

The locator attribute values for BeautifulSoup have to be updated accordingly. Previous values are no longer valid and cause an IndexError:

    def _create_blink_info(response_text):
        soup = BeautifulSoup(response_text, "html.parser")
>       daily_book_href = soup.find_all("a", {"class": "daily-book__cta"})[0]["href"]
E       IndexError: list index out of range

Jun 03 '22 03:06 ptrstn

confirmed, having this also since... 22.05.2022, because last folder i have in my library is: '2022-05-21 - Finde den Weg zu deiner inneren Mitte'/

root@banane:~# python3 -m dailyblink
dailyblink v1.2.1, Python 3.9.2, Linux armv7l 32bit ELF
Downloading the free daily Blinks on 2022-06-04 22:47:32...
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/.local/lib/python3.9/site-packages/dailyblink/__main__.py", line 67, in <module>
    main()
  File "/root/.local/lib/python3.9/site-packages/dailyblink/__main__.py", line 63, in main
    blinkist_scraper.download_daily_blinks(args.language, base_path)
  File "/root/.local/lib/python3.9/site-packages/dailyblink/core.py", line 37, in download_daily_blinks
    self._attempt_daily_blinks_download(languages, base_path)
  File "/root/.local/lib/python3.9/site-packages/dailyblink/core.py", line 56, in _attempt_daily_blinks_download
    self._download_daily_blinks(language_code, base_path)
  File "/root/.local/lib/python3.9/site-packages/dailyblink/core.py", line 63, in _download_daily_blinks
    blink_info = self._get_daily_blink_info(language=language_code)
  File "/root/.local/lib/python3.9/site-packages/dailyblink/core.py", line 126, in _get_daily_blink_info
    return _create_blink_info(response.text)
  File "/root/.local/lib/python3.9/site-packages/dailyblink/core.py", line 171, in _create_blink_info
    daily_book_href = soup.find_all("a", {"class": "daily-book__cta"})[0]["href"]
IndexError: list index out of range
root@banane:~#

Jun 04 '22 20:06 kotzer3

Jap same here. How to fix this?

Jun 05 '22 09:06 Erik262

I was able to retrieve audio and text content for the free daily by calling Blinkist's API the way the frontend does. I prefer this over BeautifulSoup because it's more direct and the new DOM lacks descriptive classes/IDs. However, I haven't integrated my approach with this codebase, and I'm not sure if it works the same for arbitrary books on Blinkist Premium. If anyone's interested, I'll post my code tomorrow. :)

Jun 07 '22 20:06 NicoWeio

I was able to retrieve audio and text content for the free daily by calling Blinkist's API the way the frontend does. I prefer this over BeautifulSoup because it's more direct and the new DOM lacks descriptive classes/IDs. However, I haven't integrated my approach with this codebase, and I'm not sure if it works the same for arbitrary books on Blinkist Premium. If anyone's interested, I'll post my code tomorrow. :)

Perfect, let me please know!

Jun 08 '22 07:06 Erik262

Here you go. :)

⚠️ Update: I've created a repo with updated code here

Again, I haven't tried other values for User-Agent yet, and I can't check whether this approach will work for Premium content.

import cloudscraper
from datetime import datetime
from pathlib import Path
import requests
from rich import print
from rich.progress import track

BASE_URL = 'https://www.blinkist.com/'

HEADERS = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0',
    'x-requested-with': 'XMLHttpRequest',
}

LOCALES = ['en', 'de']
DOWNLOAD_DIR = Path.home() / 'Musik' / 'Blinkist'

scraper = cloudscraper.create_scraper()


def get_book_dir(book):
    return DOWNLOAD_DIR / f"{datetime.today().strftime('%Y-%m-%d')} – {book['slug']}"


def get_free_daily(locale):
    # see also: https://www.blinkist.com/en/content/daily
    response = scraper.get(
        BASE_URL + 'api/free_daily',
        params={'locale': locale}
    )
    return response.json()


def get_chapters(book_slug):
    url = f"{BASE_URL}/api/books/{book_slug}/chapters"
    response = requests.get(url, headers=HEADERS)
    response.raise_for_status()
    return response.json()['chapters']


def get_chapter(book_id, chapter_id):
    url = f"{BASE_URL}/api/books/{book_id}/chapters/{chapter_id}"
    response = requests.get(url, headers=HEADERS)
    response.raise_for_status()
    return response.json()


def download_chapter_audio(book, chapter_data):
    book_dir = get_book_dir(book)
    book_dir.mkdir(exist_ok=True)
    file_path = book_dir / f"chapter_{chapter_data['order_no']}.m4a"

    if file_path.exists():
        print(f"Skipping existing file: {file_path}")
        return

    assert 'm4a' in chapter_data['signed_audio_url']
    response = scraper.get(chapter_data['signed_audio_url'])
    assert response.status_code == 200
    file_path.write_bytes(response.content)
    print(f"Downloaded chapter {chapter_data['order_no']}")


for locale in LOCALES:
    free_daily = get_free_daily(locale=locale)
    book = free_daily['book']
    print(f"Today's free daily in {locale} is: “{book['title']}”")

    # list of chapters without their content
    chapter_list = get_chapters(book['slug'])

    # fetch chapter content
    chapters = [get_chapter(book['id'], chapter['id']) for chapter in track(chapter_list, description='Fetching chapters…')]

    # download audio
    for chapter in track(chapters, description='Downloading audio…'):
        download_chapter_audio(book, chapter)

    # write markdown
    # excluded for brevity – just access chapter['text'] etc.
    # markdown_text = download_book_md(book, chapters)

Jun 08 '22 16:06 NicoWeio

@NicoWeio does your code work straight out of the box, or does this to be replaced with the core.py ?

Jun 08 '22 21:06 Erik262

Would this approach work on a Windows machine?

Jun 12 '22 00:06 WrayOfSunshine

@NicoWeio does your code work straight out of the box, or does this to be replaced with the core.py ?

See my earlier comment:

However, I haven't integrated my approach with this codebase, and I'm not sure if it works the same for arbitrary books on Blinkist Premium.

Assuming you have cloudscraper installed, my script works out of the box, and it should download the audio just fine. However, it does not generate a text or cover image file, does not set the audio's metadata, and does not precisely follow dailyblink's naming conventions.

Jun 12 '22 07:06 NicoWeio

Would this approach work on a Windows machine?

If dailyblink worked on Windows before, yes. Both my approach using Blinkist's API and the current approach using BeautifulSoup.

Jun 12 '22 07:06 NicoWeio

@ptrstn Is there a fix/update coming? you said until Sunday and then you removed your answer.

Jun 13 '22 10:06 Erik262

@ptrstn Is there a fix/update coming? you said until Sunday and then you removed your answer.

This change requires some refactoring and a little bit more time than initially expected. I'll see what I can do. Can't guarantee you when though, since I've got other things in life to take care of first.

Jun 13 '22 11:06 ptrstn

@ptrstn Is there a fix/update coming? you said until Sunday and then you removed your answer.

This change requires some refactoring and a little bit more time than initially expected. I'll see what I can do. Can't guarantee you when though, since I've got other things in life to take care of first.

sure you're right about that.

Jun 13 '22 16:06 Erik262

Here you go. :)

Again, I haven't tried other values for User-Agent yet, and I can't check whether this approach will work for Premium content.

import cloudscraper
from datetime import datetime
from pathlib import Path
import requests
from rich import print
from rich.progress import track

BASE_URL = 'https://www.blinkist.com/'

HEADERS = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0',
    'x-requested-with': 'XMLHttpRequest',
}

LOCALES = ['en', 'de']
DOWNLOAD_DIR = Path.home() / 'Musik' / 'Blinkist'

scraper = cloudscraper.create_scraper()


def get_book_dir(book):
    return DOWNLOAD_DIR / f"{datetime.today().strftime('%Y-%m-%d')} – {book['slug']}"


def get_free_daily(locale):
    # see also: https://www.blinkist.com/en/content/daily
    response = scraper.get(
        BASE_URL + 'api/free_daily',
        params={'locale': locale}
    )
    return response.json()


def get_chapters(book_slug):
    url = f"{BASE_URL}/api/books/{book_slug}/chapters"
    response = requests.get(url, headers=HEADERS)
    response.raise_for_status()
    return response.json()['chapters']


def get_chapter(book_id, chapter_id):
    url = f"{BASE_URL}/api/books/{book_id}/chapters/{chapter_id}"
    response = requests.get(url, headers=HEADERS)
    response.raise_for_status()
    return response.json()


def download_chapter_audio(book, chapter_data):
    book_dir = get_book_dir(book)
    book_dir.mkdir(exist_ok=True)
    file_path = book_dir / f"chapter_{chapter_data['order_no']}.m4a"

    if file_path.exists():
        print(f"Skipping existing file: {file_path}")
        return

    assert 'm4a' in chapter_data['signed_audio_url']
    response = scraper.get(chapter_data['signed_audio_url'])
    assert response.status_code == 200
    file_path.write_bytes(response.content)
    print(f"Downloaded chapter {chapter_data['order_no']}")


for locale in LOCALES:
    free_daily = get_free_daily(locale=locale)
    book = free_daily['book']
    print(f"Today's free daily in {locale} is: “{book['title']}”")

    # list of chapters without their content
    chapter_list = get_chapters(book['slug'])

    # fetch chapter content
    chapters = [get_chapter(book['id'], chapter['id']) for chapter in track(chapter_list, description='Fetching chapters…')]

    # download audio
    for chapter in track(chapters, description='Downloading audio…'):
        download_chapter_audio(book, chapter)

    # write markdown
    # excluded for brevity – just access chapter['text'] etc.
    # markdown_text = download_book_md(book, chapters)

Executing this code on google colab I am getting 403 forbidden error on line 70 when calling get_chapters after troubleshooting I found that response.raise_for_status() gives that error as it can't access the url which gives this error. how can I resolve this? @NicoWeio msedge_ilk1LMQ7Fj

Jun 14 '22 17:06 rajeshbhavikatti

@rajeshbhavikatti I just published my code here, so we can keep this issue clean from further discussions. Notice the double slash in the URL? That might be the cause, although it didn't cause issues for me. Maybe because of a different requests version? Anyway, I fixed the double slashes in my code. Plus, I've added CI to my repo, and it works just fine there, too.

Jun 15 '22 08:06 NicoWeio

@ptrstn Is there a fix/update coming? you said until Sunday and then you removed your answer.

This change requires some refactoring and a little bit more time than initially expected. I'll see what I can do. Can't guarantee you when though, since I've got other things in life to take care of first.

Hi Peter @ptrstn , do you have some updates on this?

Sep 29 '22 12:09 kotzer3

Hi Peter @ptrstn , do you have some updates on this?

I'll be able to work on it starting beginning of October, since I'm still busy with private issues

Sep 29 '22 12:09 ptrstn

Hi Peter @ptrstn , do you have some updates on this?

I'll be able to work on it starting beginning of October, since I'm still busy with private issues

Any News for us?

Dec 10 '22 08:12 kotzer3

Hi, I have made some updates based on this repo feel free to reach out to me on any changes or update check out my notebook here

Jan 07 '23 06:01 rajeshbhavikatti

@rajeshbhavikatti nice work, but you don't catch the mp3 files.

Jan 07 '23 07:01 Erik262

@Erik262 yes, as the notion API doesn't support it yet

Jan 07 '23 07:01 rajeshbhavikatti

dailyblink dailyblink copied to clipboard

Daily Blink Page Layout has changed - IndexError: list index out of range

dailyblink
dailyblink copied to clipboard