ghindex icon indicating copy to clipboard operation
ghindex copied to clipboard

reproducing ghindex

Open gg4u opened this issue 10 years ago • 3 comments

hello Anvaka,

I also commented on your reddit (how to enjoy a weekend with a discovery project :) - please read my questions there; would you also mind to include a guide to set up gazer to fetch data from the backend, and how to set up the backend on S3? you use Redis, not clear what you put on S3 : i understood you placed the json files generated with the proximity generator in ghindex.. could you please include a guide on how to set up the web app and connect with the db? i am learning and can get prototype on local servers (i used neo for another project, still to check redis), but haven't put yet smtg on the cloud. also, how id dyou indexed the urls in the input field ? redis? thank you!

gg4u avatar Jan 17 '15 22:01 gg4u

Uhm... That's too many questions and would take me a while to answer :)! What are you trying to achieve?

anvaka avatar Jan 17 '15 23:01 anvaka

hi, i would like to :

  1. replicate ghindex to explore possibilities to improve recommendations (i've picked up your reddit post) Where I could find a description of the data in github_timeline?
  • e.g. see BigQuery, Table Details: github_timeline: previews displays a few rows, Type='WatchEvent' is missing in the sampled ones (there's 'DownloadEvent' and others, I wouldn't know 'WatchEvent' is present) - so I wonder where could i find a description of the variables in Type and other fields *
  1. I tried to replicate ghindex with Sorensen Dice, against a sample limited to 1M rows. I obtained quite different results: maybe i've mistaken something, or is it a too little sample?
  2. replicate the web app of yours! I want to learn to set up a 'minimum viable web app' :) , so to:
  3. full-text index the nodes to be searched in the input field (e.g. the git repos in gazer);
  4. fetch recommendations / compute recommendation on demand (I left a note on reddit)

In your structure, did you use redis to store name of repos as keys, and the URI in S3 as values, and then store the proximity correlations in jsons in S3? or either put all the proximities correlations in redis as: key (name of repo), value (list of correlations) ?

thank you!

gg4u avatar Jan 18 '15 12:01 gg4u

Ok about first question: https://developer.github.com/v3/activity/events/types/

gg4u avatar Jan 18 '15 15:01 gg4u