depfinder icon indicating copy to clipboard operation
depfinder copied to clipboard

Potential data model change

Open CJ-Wright opened this issue 5 years ago • 5 comments

It might be good to change the depfinder data model so better reflect a data-report relationship. This could include:

  1. Express the import search as a dict with keys being the imports the values would contain metadata about the import, for instance, line number if it is in a try/except block, if it is mutually exclusive/cumulatively exhaustive (MECE) with other imports
  2. Generate reports from that data set
    1. turn all the imports into conda package names (as we have now)
    2. turn all the imports into pypi names
    3. conda package names with version constraints (based on libcfgraph data)
    4. conda package names with delineations between MECE packages

thoughts @ericdill @ocefpaf @jkarp314

CJ-Wright avatar Oct 24 '20 13:10 CJ-Wright

My impression is that your (1) is the change to the data model and (2) - (6) are downstream functions that take in the new data model and return those things.

Regarding what that data model looks like, in my head it's something like this:

{
    "stdlib_list": {
        "occurances": {
            ("depfinder/main.py", 41): {
                "try": false,
                "if": false,
                "class": false,
                "function": false,
                "exact_line": "from stdlib_list import stdlib_list"
            },
        }
        "conda": "stdlib-list",
        "pypi": "stdlib-list",
    }
    "requests": {
        "occurances": {
            ("depfinder/main.py", 44): {
                "try": false,
                "if": false,
                "class": false,
                "function": false,
                "exact_line": "import requests"
            },
        }
        "conda": "requests",
        "pypi": "requests",
    }
}

Then you can take that data structure and do whatever you want with it downstream after the code parsing has completed. I imagine that every time an import is encountered you could add it to the occurances dict with the file / line number tuple providing unique keys in that dict. Keeping track of all of the places that each import occurs inside of depfinder is something that I've wanted to do for a while, so this seems like a good opportunity to do that.

Not sure if conda and pypi belong in this dict or not. Probably not, now that I think on it a bit more.

Anyway, thoughts?

ericdill avatar Oct 24 '20 17:10 ericdill

Yeah, sorry markdown didn't render the tabs properly.

CJ-Wright avatar Oct 24 '20 17:10 CJ-Wright

I would have the conda/pypi part done independently, since a 3rd party may want to use their own mappings.

One of my concerns is how to get enough detail into the questionable imports piece (try, if, etc.). I would want to know which libs are part of a "pick one" set (for instance try: import pyqt4; except ImportError: import pyqt5). This would enable us to make certain that you had at least some of the libs depfinder thought you needed. For tooling around generating the requirements from scratch, I'm not certain how you would include that but it would be good to have for other use cases.

CJ-Wright avatar Oct 24 '20 17:10 CJ-Wright

Agreed on the conda/pypi part.

Regarding the "pick one" set stuff, I'm pretty sure you could do that via ast. Not exactly sure how, but I don't imagine it would take too much poking around. Would probably require some reworking of how the ImportFinder works. Might need to consider designing a state machine to track where we are in the hierarchy. On second thought, this is probably a bit trickier than I originally thought

ericdill avatar Oct 24 '20 18:10 ericdill

Right, I'm hopeful that we could do it in the code, but I'm not certain what the data model would be that supports it. Maybe we associate shielded imports with an ID, so that all imports with (or within) that ID are associated together.

CJ-Wright avatar Oct 24 '20 18:10 CJ-Wright