pkgnet
pkgnet copied to clipboard
Python Implementation
I've begun working on a Python implementation of pkgnet. 🎉
Branch is here: https://github.com/jayqi/pkgnet/tree/py-pkg/py-pkg
This issue will track progress and discussion on design decisions.
Some code standards decisions to make:
- [ ] Linter: Currently using flake8, link to config
- [ ] Formatter: Currently using black, link to config
- [ ] Line length: Currently using black's default of 88 characters
- [ ] Minimum Python version: Currently 3.6, because I'm using f-strings.
- [ ] Docstring style: Currently using Google
What I have implemented as of now:
- Working node and edge extraction for DependencyReporter
- Working node and edge extraction for InheritanceReporter
- Working node and edge extraction for a new reporter, ModuleReporter, since Python has the concept of modules (R does not)
- This currently gets the file tree network. Is this useful? Is imports a better network to extract?
- Working implementation of DirectedGraph that uses networkx backend
- Question: do we want to stick with networkx (which is more popular) or switch to igraph (which is what R package uses)?
Some open areas that either seem hard or I don't know much about:
- How to parse Python code for symbolic references for FunctionReporter
-
<obj>.__code__.co_names
gives strings for symbolic references, but split on dots. Looking up the object will be hard. - Use ast library?
-
- How to implement test packages for unit tests
- How do we want to generate a report? Some potential options:
- Simple HTML template, use jinja2
- Use Jupyter notebooks and nbconvert
- Which network viz library? Static image or interactive javascript? Some options I was looking at:
- pyvis uses vis.js which is what visNetwork in R is using
- plotly example doesn't let you drag nodes, may need to dig into options further
- py2cytoscape uses cytoscape.js
How do we want to generate a report? Some potential options: Simple HTML template, use jinja2 Use Jupyter notebooks and nbconvert
I personally prefer HTML + jinja2. jupyter
and nbconvert
are ubiquitous, but also kind of heavy for what we want to do.
I'm ok with all of the "standards to discuss" things. I personally prefer line length 100 but not something I'm going to fight hard on.
I think I could do "How to implement test packages for unit tests". I had to do that for doppel-cli
(https://github.com/jameslamb/doppel-cli/tree/master/integration_tests/test-packages/python)
I like pyvis
for consistency with the R package. Then we can live or die with vis.js
, and maybe even discover things in it that we can fix or open issues for.
On networkx vs igraph, does igraph have a very similar API in python and R? We don't do all too much with graph metrics at the moment, but if we ever do, maybe a similar API would make parity easier.
On Thu, Feb 20, 2020, 10:19 PM James Lamb [email protected] wrote:
I like pyvis for consistency with the R package. Then we can live or die with vis.js, and maybe even discover things in it that we can fix or open issues for.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uptake/pkgnet/issues/267?email_source=notifications&email_token=AF3FDS4HO35WFRWS4OZ4A7DRD5I5DA5CNFSM4KYI5JXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMROUPA#issuecomment-589490748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3FDS7MPU3LY2VZBNGV7T3RD5I5DANCNFSM4KYI5JXA .
I'm ok with all of the "standards to discuss" things. I personally prefer line length 100 but not something I'm going to fight hard on.
100 length sounds good to me.
I personally prefer HTML + jinja2. jupyter and nbconvert are ubiquitous, but also kind of heavy for what we want to do.
I agree. I think this is right level to make that tradeoff for what we need.
I like pyvis for consistency with the R package. Then we can live or die with vis.js, and maybe even discover things in it that we can fix or open issues for.
I'm thinking about putting the viz backend in a separate class to make it easier to replace it later. (This was also something I was thinking about doing for R.) I'll start with pyvis.
On networkx vs igraph, does igraph have a very similar API in python and R? We don't do all too much with graph metrics at the moment, but if we ever do, maybe a similar API would make parity easier.
Sort of. It looks like the Python version has these enormous graph classes with a bajillion methods. The R implementation was completely functional. In that way the APIs are pretty different. The parameters are similar though, probably because everything uses the same C++ underneath. Here is an example:
- https://igraph.org/python/doc/igraph.GraphBase-class.html#betweenness
- https://igraph.org/r/doc/centr_betw.html
Something to consider:
- https://github.com/networkx/networkx 6.9k stars
- https://github.com/igraph/python-igraph 591 stars
So the really annoying thing, and maybe deal-breaker, with pyvis is that they don't actually output code you can easily embed in an HTML doc. They have their own HTML document template where they use jinja2 to inject data into the raw javascript code to draw the graph.
https://github.com/WestHealth/pyvis/blob/master/pyvis/templates/template.html
In order to use this, we'd have to copy-paste their javascript code block into our templates.
I'm thinking now that it may be most straightforward to just write the visualization code directly in javascript, and just inject the network data as json. This might actually be the most flexible, because we don't have to worry about existence or support of a Python interface. Which don't seen to exist in general anyways. Tt doesn't seem like there are as many good options to have Python generate the javascript code -- there isn't the same htmlwidget ecosystem that R has, as far as I can tell.
I'm thinking now that it may be most straightforward to just write the visualization code directly in javascript, and just inject the network data as json. This might actually be the most flexible, because we don't have to worry about existence or support of a Python interface. Which don't seen to exist in general anyways. Tt doesn't seem like there are as many good options to have Python generate the javascript code -- there isn't the same htmlwidget ecosystem that R has, as far as I can tell.
ok I like this! I think it makes a lot of sense, gives us a clearer path for "language 3", and (selfishly) is an opportunity for us to practice a little JS
Making some progress!
Not too much work to implement a different library. This is sigma.js.
So I've got a working implementation in place for the most part. Wondering if I should open a pull request. At some point it might be easier to start making issues for smaller pieces of stuff and tackling them individually.
So I've got a working implementation in place for the most part. Wondering if I should open a pull request. At some point it might be easier to start making issues for smaller pieces of stuff and tackling them individually.
@jayqi I definitely think you should open a PR! I propose that you open the PR and we only focus on resolving comments related to the API, then worry about which viz library, how to handle the weird "package name you install is different from the one you import" thing, and other issues later.
Hi @jayqi, thanks for your great work!
After successfully installing the package, I am trying to replicate your example but this does not work for me (Python 3.7.5 on Windows 10):
Python 3.7.5 (tags/v3.7.5:5c02a39a0b, Oct 15 2019, 00:11:34) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from pkgnet import create_package_report
>>> create_package_report("jinja2")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Nicolas\Documents\home\workspace\pkgnet\py-pkg\pkgnet\package_report.py", line 172, in create_package_report
pkg_reporters = default_reporters()
File "C:\Users\Nicolas\Documents\home\workspace\pkgnet\py-pkg\pkgnet\package_report.py", line 33, in default_reporters
DependencyReporter(),
TypeError: Can't instantiate abstract class DependencyReporter with abstract methods network_measures
Any help would be greatly appreciated.
Thank you so much,
Nicolas
@nicolasfguillaume Thanks for trying it out! Sorry that you had an error—it looks like I had a bug sneak in there. #281 should fix it.
Now that CI is lower cost (in terms of time), I'd like to test what appetite there is to update the docs and publish to PyPy. I feel guilty we're let this sit for years.
LOE:
- Name it!
- New docs
- New CI
- Handful of new issues after release
How about it @jayqi @jameslamb?
I haven't touched my implementation nearly 4 years. I don't know if it's really ready for users (I vaguely recall some stuff was still kind of janky), and I'm not sure I want to commit to developing/maintaining it.
+1 I can't to that, I can't commit to supporting the Python version.
In my opinion, you or @jayqi should do one of the following:
- just delete that branch
- publish the code to a separate public repo with permissive license then immediately archive that repo