Lupa icon indicating copy to clipboard operation
Lupa copied to clipboard

A framework for the large scale analysis of programming language usage.

Lupa 🔍

JetBrains Research Kotlin build Python build

Lupa 🔍 is an extendable framework for analyzing fine-grained language usage on the base of the IntelliJ Platform. Lupa 🔍 is a command line tool that uses the power of the IntelliJ Platform under the hood to perform code analysis using the same industry-level tools that are employed in IntelliJ-based IDEs, such as IntelliJ IDEA, PyCharm, or CLion.

Currently, our framework supports analyzing two languages: Python --- a mature language most popular in data science and machine learning, and Kotlin --- a relatively young but quickly growing language.

How it works

Lupa 🔍 is a platform for large-scale analysis of the programming language usage. Specifically, Lupa 🔍 is implemented as a plugin for the IntelliJ Platform that reuses its API to launch the IDE in the background (without user interface) and run the necessary analysis on every project in the given dataset.

The main pipeline of Lupa 🔍 is demonstrated bellow:

An operating pipeline of the tool

To perform the analysis, the tool needs two obvious components: a dataset and analyzers, i.e., sets of instructions of what PSI tree nodes need to be analyzed and how. To get more information about data collection see the data_collection module. The repository contains several core-modules:

  • lupa-core - functions common to all modules and analyzers;
  • lupa-test - common tests' architecture for all modules;
  • lupa-runner - the module with runners for all analyzers;
  • scripts - common functionality for data gathering, processing and visualization (written in Python).

And several examples of analyzers that we used for our purposes:

  1. Kotlin's analysers:
    • clones - functionality related to clones analysis in Kotlin projects;
    • dependencies - functionality related to dependency analysis in Kotlin projects;
    • gradle - functionality related to code analysis of the Gradle files in Kotlin projects;
    • statistic - functionality related to different code analysis in Kotlin projects, like range analysis;
  2. Python's analysers:
    • callExpressions - functionality related to call expressions (functions, classes, decorators) analysis in Python projects;
    • imports - functionality related to imports analysis in Python projects.

To get more information see these modules (each of them has a README file).

Installation

Clone the repo by git clone https://github.com/JetBrains-Research/Lupa.git.

For analyzers modules and core architecture you should have Kotlin at least 1.5.21 version. For functionality for data gathering, processing and visualization (scripts module) you should have Python 3+ and also run:

  • pip install -r scripts/requirements.txt
  • pip install -r scripts/requirements-test.txt - for tests (optional)
  • pip install -r scripts/requirements-code-style.txt - for code style checkers (optional)

Usage

  1. For analyzers:
    • For Kotlin analyzers go to the kotlin-analysers module and follow its README file.
    • For Python analyzers go to the python-analysers module and follow its README file.
  2. For functionality for data gathering, processing and visualization:
    • Go to the scripts module and follow its README file.

Contribution

Please be sure to review project's contributing guidelines to learn how to help the project.