redditflow icon indicating copy to clipboard operation
redditflow copied to clipboard

Few suggestions

Open monk1337 opened this issue 3 years ago • 1 comments
trafficstars

The project is fantastic; here are a few suggestions :

  1. It would be good if there were separate repo for redditflow data and reddit flow model APIs. Sometimes developers want to extract only data and use their model, and sometimes they want to use models but different data. Combining both things results in a bigger size of repo, and also, if I want to scrape only data, I need to install torch, sentence-transformer, sentencepiece etc. ( reference can be huggingface's dataset API and model API )

  2. Update the doc for redditflow, including how to extract data based on a single keyword and extract all comments and posts from a single subreddit?

  3. Organize the nfflow repo into some base functions which can utilize further for other platform APIs such as Twitter etc

  4. Add ML Intelligence to data fetching and scrapping ( example: OpenAI's CLIP )

  5. it can also include Elasticsearch to fetch data faster from the downloaded archive.

Here is a simple overview of integrating OpenAI's CLIP project into nfflow:

  • Download image data from different sources
  • Use Colab to load data and train OpenAI's CLIP model to convert images into vector
  • save the vectors into the user's gdrive
  • Perform evaluation ( search query ) over downloaded data

It can be automated end to end if training on colab and fetching vectors from the drive can be automated.

monk1337 avatar Jun 08 '22 04:06 monk1337

Awesome! Will look into this soon.

abhijithneilabraham avatar Jun 08 '22 05:06 abhijithneilabraham