tutorial
tutorial copied to clipboard
Duplicate urls
In 'projects' table, there are cases of different 'id's matching the same url, directing to the same Github project. In other words, there are duplicates in projects. As in the Nov 2016 version, we found about 10k duplicates for C++ projects, 27k duplicates for Java projects, and 22k for Python projects. For example, id=102747 and id=424108 have the same url https://api.github.com/repos/zmeadows/cybernetic-banana id=43736 and id=1307727 have the same url https://api.github.com/repos/indutny/defer-tick id=1881530 and id=2502246 have the same url https://api.github.com/repos/persistentsnail/AOI