hntitlenator
hntitlenator copied to clipboard
Getting data
Hello, I maintain a Dataset in Kaggle with HN posts and points per category: https://www.kaggle.com/santiagobasulto/all-hacker-news-posts-stories-askshow-hn-polls
It might be useful. The source is available here
That's awesome. Soon as I can I'll try to retrain the NN with the new data.
👍 great! I'll update it this afternoon, I run a script periodically to have the latest data in it.
Boy, my computer is having a hard time processing this much data. I don't think I'll be able to train the NN with such a huge amount of data.
😂 you can use colab or other platforms with GPU/CPU. What do you need to extract from it?
I need to extract the title and the score only, then I have to tokenize the words turning them into vectors and only then I need to feed the new data to the NN, I'll take a look a it after I leave work.
Alright, I'll get that ready for you soon.
Just created a small version containing only Title, Post Type and Points: https://drive.google.com/file/d/1sZx3zidIwezFx4gNEWZIJ7KpN4V-eBEE/view?usp=sharing
Post Type is encoded: 0 for regular stories, 1 for Ask HN, 2 for Polls and 3 for Show HN.
