CVPR2019
CVPR2019 copied to clipboard
Displays all the 2019 CVPR Accepted Papers in a way that they are easy to parse.
Updated repository at https://github.com/mattdeitke/CVPR-Accepted-Papers-Viewer
CVPR 2019 Accepted Papers
The main goal of these scripts is to build a page that displays the accepted papers for CVPR 2019 in a way that is easier for humans to parse (see: https://mattdeitke.com/CVPR-2019). Below is an example of what this repository will display, and following that is what CVPR open access currently shows.
data:image/s3,"s3://crabby-images/59a9b/59a9ba305bba0dda11eb72c449ce08307f6baf45" alt=""
data:image/s3,"s3://crabby-images/d038b/d038b31814e086f19e4c8c73f8f4f019bb07c64c" alt=""
Installation
-
Clone this repository
git clone https://github.com/mattdeitke/CVPR2019
-
Save the HTML from where the accepted papers are displayed. For CVPR, this year, that would be
http://openaccess.thecvf.com/CVPR2019.py
. -
Install ImageMagick, which can be done using
sudo apt-get install imagemagick
or using another supported method such asbrew install imagemagick
. -
Run
pdftowordcloud.py
to generate top words for each paper. The output is saved in topwords.p. -
Run
pdftothumbs.py
to generate tiny thumbnails for all papers. The outputs are saved in thumbs/ folder. -
Run
scrape.py
to generate each paperid, title, authors list by scraping the cvpr2019oar.html page. -
Run
makecorpus.py
to create allpapers.txt file that has all papers (one per row). -
Run
python lda.py -f allpapers.txt -k 7 --alpha=0.5 --beta=0.5 -i 100
. This will generate a pickle file calledldaphi.p
that contains the LDA word distribution matrix. Thanks to this nice LDA code by @shuyo! It requires nltk library and numpy. In this example we are using 7 categories. You would need to change thecvprnice_template.html
file a bit if you wanted to try different number of categories. -
Generate the abstract files inside abstracts/ folder using
getabstracts.py
. -
Finally, run
generatenicelda.py
to create theindex.html
page.
Acknowledgements
Big thanks to @karpathy for his NeurIPS preview and ArXiV Sanity Preserver, which is what this repository builds on! Also a thanks to @tholman for creating a more modern GitHub Corners and @shuyo for the LDA code! Finally, more thanks go to the people at CVPR for openly publishing all of their accepted papers!
Licence
MIT License