data issues

[feature] Crawl all non-medium websites to fetch all articles

1

### TODO - Update the [src/populate_csv_files/get_article_content/crawl_non_medium_websites.py](https://github.com/mev-fyi/src/populate_csv_files/get_article_content/crawl_non_medium_websites.py) to crawl all posts (URLs) from all websites in data.mev.fyi available at [`data/links/websites.csv`](https://github.com/mev-fyi/data/blob/main/data/links/websites.csv). Visualize websites at data.mev.fyi on _Websites_ tab. - Input: [website URLs](https://github.com/mev-fyi/data/blob/main/data/links/websites.csv)....

vmeylan

[feature] spin off the whole youtube data ingestion as micro-service

The Youtube data ingestion + indexing is big. It could be worthwhile to spin-off as a micro-service / package to make it more maintainable, have people fork it to build...

vmeylan

Girotomas

2

Getting the author links from the article links

girotomas

[feature] index all URLs and ingest all "awesome <topic>" github repos

There are a lot of `awesome` github repos we can index/scrape and add to the database. Surely we can also automatically generate an "awesome of awesome" repos for MEV, DeFi,...

vmeylan

[feature] extract all unique blog websites from articles

### TODO - Create a script in https://github.com/mev-fyi/data/blob/main/src/populate_csv_files/get_article_content/get_websites_from_articles.py where we extract the unique authors' blog link from all the articles from https://github.com/mev-fyi/data/blob/main/data/links/articles_updated.csv (`article` header). - Create a second script to...

vmeylan

data
data copied to clipboard

Metadata

[feature] Crawl all non-medium websites to fetch all articles

[feature] spin off the whole youtube data ingestion as micro-service

Girotomas

[feature] index all URLs and ingest all "awesome <topic>" github repos

[feature] extract all unique blog websites from articles

[feature] automatically scrap the referenced websites to fetch the latest published articles

[refactor] Articles processing

[feature] implement autodoc to automatically generate docs from github code repositories

[feature] add safeguards when re-fetching PDFs articles to guarantee content's integrity

[feature] Create a React front-end for data.mev.fyi Google Sheet

← Metadata

Owner

Metadata

data data copied to clipboard

Metadata

← Metadata

Owner

Metadata

data
data copied to clipboard