datavis-hackathon icon indicating copy to clipboard operation
datavis-hackathon copied to clipboard

Run Distributed Release Audit Tool (DRAT) on all codefest generated code and report out on license statistics

Open chrismattmann opened this issue 9 years ago • 6 comments

DRAT (https://github.com/chrismattmann/drat/) is a release audit tool that takes Apache RAT and turns it into a Map Reduce style system for large and heterogeneous code bases where RAT falls flat on its face. RAT is unable to easily differentiate between different file MIME types and tries to do license analysis on e.g., binary files unless specified through complex white lists and black lists. DRAT on the other hand, improves upon RAT by taking Apache Tika, partitioning the code base by MIME type, constructing a Solr4 catalog of the code, and then farming out a large Map Reduce style job wherein which the Mapper is a N-sized (configurable, set initially to 100) set of files of the same MIME type, partitioned across machines using Apache OODT, and the reducer is the RAT log aggregator that combines each Mapped RAT job's intermediate RAT log output.

DRAT has been run on the DARPA XDATA code base (~50K thousand files, 10s of M of lines of code), and the Computational Infrastructure for Geodynamics (CIG) (~500K thousand files, 100s of M of lines of code) and scales well, is easy to use and the software can be run on a single machine with an existing OS or ran using Vagrant and vagrant up as a virtual machine.

This task will involve deploying DRAT, and then running it across the code bases to perform a license analysis and to report out on the results at the end of the hackathon. Patches and improvements to DRAT are welcomed as well.

Continuation of issue from Open Science Codefest: https://github.com/NCEAS/open-science-codefest/issues/27

chrismattmann avatar Oct 07 '14 12:10 chrismattmann

The following codebases were analyzed using DRAT

https://github.com/NSF-Polar-Cyberinfrastructure/datavis-hackathon.git https://github.com/NSF-Polar-Cyberinfrastructure/issue-79.git https://github.com/NSF-Polar-Cyberinfrastructure/issue-15.git https://github.com/NSF-Polar-Cyberinfrastructure/issue-3.git https://github.com/NSF-Polar-Cyberinfrastructure/issue-43.git https://github.com/alb0/3d-greenland.git

Reports are available at the following ZIP

https://www.dropbox.com/s/w5pn6fp8tms9m60/nsf_polar_hackathon.zip?dl=0

Advice on how to interpret the report can be found at https://github.com/chrismattmann/drat/wiki/Interacting-with-DRAT

Notes Binaries Archives Standards Apache Generated Unknown 0 0 0 109 17 0 92

lewismc avatar Apr 23 '15 19:04 lewismc

Unapproved licenses

27 Javascript
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/Detector.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/KML.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/Tween.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/bootstrap.js_04232015_1231
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/issues.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/jquery.fileupload-process.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/jquery.fileupload-ui.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/jquery.fileupload.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/jquery.flexslider.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/jquery.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/jquery.js_04232015_1231
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/jquery.main.eng.min.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/jquery.min.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/jquery.ui.widget.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/jquery.xdr-transport.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/kml-layer.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/leaflet-0.6.4_leaflet.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/leaflet.filelayer.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/leaflet.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/main.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/map.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/ol.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/pictures.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/side_menu.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/three.min.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/togeojson.js
  /usr/local/drat/deploy/data/jobs/rat/1429817469424/input/update-script.js
14 CSS
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/OpenLayer_style.css
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/bootstrap-glyphicons.css
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/bootstrap.css_04232015_1231
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/flexslider.css
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/jquery.fileupload-ui.css
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/jquery.fileupload.css
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/leaflet-0.6.4_leaflet.css
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/leaflet-0.6.4_leaflet.ie.css
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/leaflet.css
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/ol.css
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/promo-min.css
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/side_menu.css
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/side_menu.css_04232015_1231
  /usr/local/drat/deploy/data/jobs/rat/1429817469787/input/style.css
26 Python
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/ajax.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/clip_geotiff_by_shp.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/config.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/conversion.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/data_management.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/extract_shp_table.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/extract_shp_table.py_04232015_1231
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/forms.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/gtif_to_tile.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/manage.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/metadata.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/models.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/nc_dict.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/netcdf_dict.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/open_file.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/opendap.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/re-project.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/settings.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/shp_dict.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/spatial_analysis.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/tif_dict.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/urls.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/urls.py_04232015_1231
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/views.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/what_file.py
  /usr/local/drat/deploy/data/jobs/rat/1429817471306/input/wsgi.py
23 HTML

  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/buffer_shp.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/clip_geotiff_by_shp.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/color_table_geotiff.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/coord_to_point_shp.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/extract_shp_table.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/footer.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/geotiff_resolution.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/geotiff_to_kml.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/header.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/index.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/index.html_04232015_1231
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/ncdump_header.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/ncdump_whole_netcdf.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/netcdf_to_geojson.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/netcdf_to_geotiff.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/point_inside_shapefile.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/re_project_geotiff.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/re_project_shapefile.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/shp_to_json.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/shp_to_kml.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/shp_to_tif.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/tif_to_point_json.html
  /usr/local/drat/deploy/data/jobs/rat/1429817469678/input/tif_to_point_shp.html
1 x-text
  /usr/local/drat/deploy/data/jobs/rat/1429817470196/input/report.tex
1 x-sh
  /usr/local/drat/deploy/data/jobs/rat/1429817469962/input/parse.sh

lewismc avatar Apr 23 '15 19:04 lewismc

@r4space @chrismattmann check this out

lewismc avatar Apr 23 '15 19:04 lewismc

@chrismattmann feel free to close off if you feel the task has been completed. The remainder of the work is to get ALv2.0 over everything else.

lewismc avatar Apr 23 '15 19:04 lewismc

Some more stats folks. Using cloc I managed to obtain the following which is rather nice

lmcgibbn@LMC-032857 ~/Desktop $ ./cloc-1.62.pl nsf_polar_hackathon.zip
     143 text files.
     135 unique files.
      25 files ignored.

http://cloc.sourceforge.net v 1.62  T=0.72 s (164.3 files/s, 44677.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CSS                             15           2135            134          12528
Javascript                      32           1151           1291           8739
Python                          24            265             49           1245
HTML                            25            172             70           1116
XML                              4            411           1513            627
Teamcenter met                   8              0              0            488
Bourne Shell                    10             12              0            148
-------------------------------------------------------------------------------
SUM:                           118           4146           3057          24891
-------------------------------------------------------------------------------

lewismc avatar Apr 23 '15 20:04 lewismc

Thanks @lewismc this is great! Let's leave this open as I want all sessions to remain showing on the website. Thank you this is perfect!

chrismattmann avatar Apr 26 '15 02:04 chrismattmann