📺 🔬 🛢️

Scrap France 2, France 3, and TF1 Tv news to analyse humanity's biggest challenge : fossil energies and climate change and analyse the data on a website.

metabaseexample

Data sources - HTMLs pages :

TF1 : https://www.tf1info.fr/emission/le-20h-11001/extraits/
France 2 : https://www.francetvinfo.fr/replay-jt/france-2/20-heures/jt-de-20h-du-jeudi-30-decembre-2021_4876025.html
France 3 : https://www.francetvinfo.fr/replay-jt/france-3/19-20/jt-de-19-20-du-vendredi-15-avril-2022_5045866.html

Data sinks

JSON ➡️ https://github.com/polomarcus/television-news-analyser/tree/main/data-news-json/
CSV or actually Tab Separated Values (TSV) compressed (if you don't know how to uncompressed these data) ➡️️ https://github.com/polomarcus/television-news-analyser/tree/main/data-news-csv/

JSON data can be stored inside Postgres and displayed on a Metabase dashboard (read "Run" on this readme), or can be found on this website :

Run

Requirements

docker compose
Optional: if you want to code you have to use Scala build tool (SBT)

Spin up 1 Postgres, Metabase, nginx and load data to PG using SBT

Docker Compose without SBT (Scala build tool)

# with docker compose - no need of sbt
./init-stack-with-data.sh
# this script does this : docker-compose -f src/test/docker/docker-compose.yml up -d --build app

Init Metabase to explore with SQL

After you ran the project with docker compose, you can check metabase here http://localhost:3000 with a few steps :

configure an account
configure PostgreSQL data source: (user/password - host : postgres - database name : metabase) (see docker-compose for details)
You're good to go : "Ask a simple question", then select your data source and the "Aa_News" table

Jupyter Notebook

Some examples are inside example.ipynb, but I preferred to use Metabase dashboard and visualisation using SQL

To scrap data from 3 pages from France 2 website

sbt "runMain com.github.polomarcus.main.TelevisionNewsAnalyser 3"

To store the JSON data to PG and explore it with Metabase

sbt "runMain com.github.polomarcus.main.SaveTVNewsToPostgres"

To update data for the website alone

sbt "runMain com.github.polomarcus.main.UpdateNews"

How does it run automatically every day ?

Last replays France 2, 3 and TF1 are scrapped with a GitHub Action, then this news are stored inside this folder partitioned by media and by date.

If news title or description contains a "global warming key word" : they are marked as such with containsWordGlobalWarming: Boolean.

Some results can be found on this repo's website : https://polomarcus.github.io/television-news-analyser/ | https://observatoire.climatmedias.org/

To check the GitHub Action

Click here : https://github.com/polomarcus/television-news-analyser/actions/workflows/save-data.yml
Click on the last workflow ran called "Get news from websites", then on "click-here-to-see-data"
Click on "List France 2 news urls containing global warming (see end)" to see France 2's urls
Click on "List TF1 news urls containing global warming (see end)" to see TF1's urls :

Urls are listed on the github action workflow

Checkout the project website locally (https://observatoire.climatmedias.org/)

Go to http://localhost:8080 The source are inside the docs folder

Test

# first, be sure to have docker compose up with ./init-stack-with-data.sh
sbt test # it will parsed some localhost pages from test/resources/

Test only one method

sbt> testOnly ParserTest -- -z parseFranceTelevisionHome

Libraries documentation

https://github.com/ruippeixotog/scala-scraper
https://circe.github.io/circe/parsing.html
Have multiple threads to handle future

television-news-analyser
television-news-analyser copied to clipboard

Metadata

TV news analyser | https://observatoire.climatmedias.org/ 📺 🔬 🛢️

Data sources - HTMLs pages :

Data sinks

Run

Requirements

Spin up 1 Postgres, Metabase, nginx and load data to PG using SBT

Docker Compose without SBT (Scala build tool)

Init Metabase to explore with SQL

Jupyter Notebook

To scrap data from 3 pages from France 2 website

To store the JSON data to PG and explore it with Metabase

To update data for the website alone

How does it run automatically every day ?

To check the GitHub Action

Checkout the project website locally (https://observatoire.climatmedias.org/)

Test

Test only one method

Libraries documentation

← Metadata

Owner

Metadata

television-news-analyser television-news-analyser copied to clipboard

Metadata

TV news analyser | https://observatoire.climatmedias.org/ 📺 🔬 🛢️

Data sources - HTMLs pages :

Data sinks

Run

Requirements

Spin up 1 Postgres, Metabase, nginx and load data to PG using SBT

Docker Compose without SBT (Scala build tool)

Init Metabase to explore with SQL

Jupyter Notebook

To scrap data from 3 pages from France 2 website

To store the JSON data to PG and explore it with Metabase

To update data for the website alone

How does it run automatically every day ?

To check the GitHub Action

Checkout the project website locally (https://observatoire.climatmedias.org/)

Test

Test only one method

Libraries documentation

← Metadata

Owner

Metadata

television-news-analyser
television-news-analyser copied to clipboard