pkgnet icon indicating copy to clipboard operation
pkgnet copied to clipboard

Python Implementation

Open jayqi opened this issue 5 years ago • 18 comments

I've begun working on a Python implementation of pkgnet. 🎉

Branch is here: https://github.com/jayqi/pkgnet/tree/py-pkg/py-pkg

This issue will track progress and discussion on design decisions.

Some code standards decisions to make:

  • [ ] Linter: Currently using flake8, link to config
  • [ ] Formatter: Currently using black, link to config
  • [ ] Line length: Currently using black's default of 88 characters
  • [ ] Minimum Python version: Currently 3.6, because I'm using f-strings.
  • [ ] Docstring style: Currently using Google

jayqi avatar Feb 20 '20 07:02 jayqi

What I have implemented as of now:

  • Working node and edge extraction for DependencyReporter
  • Working node and edge extraction for InheritanceReporter
  • Working node and edge extraction for a new reporter, ModuleReporter, since Python has the concept of modules (R does not)
    • This currently gets the file tree network. Is this useful? Is imports a better network to extract?
  • Working implementation of DirectedGraph that uses networkx backend
    • Question: do we want to stick with networkx (which is more popular) or switch to igraph (which is what R package uses)?

Some open areas that either seem hard or I don't know much about:

  • How to parse Python code for symbolic references for FunctionReporter
    • <obj>.__code__.co_names gives strings for symbolic references, but split on dots. Looking up the object will be hard.
    • Use ast library?
  • How to implement test packages for unit tests
  • How do we want to generate a report? Some potential options:
    • Simple HTML template, use jinja2
    • Use Jupyter notebooks and nbconvert
  • Which network viz library? Static image or interactive javascript? Some options I was looking at:
    • pyvis uses vis.js which is what visNetwork in R is using
    • plotly example doesn't let you drag nodes, may need to dig into options further
    • py2cytoscape uses cytoscape.js

jayqi avatar Feb 20 '20 07:02 jayqi

How do we want to generate a report? Some potential options: Simple HTML template, use jinja2 Use Jupyter notebooks and nbconvert

I personally prefer HTML + jinja2. jupyter and nbconvert are ubiquitous, but also kind of heavy for what we want to do.

jameslamb avatar Feb 20 '20 17:02 jameslamb

I'm ok with all of the "standards to discuss" things. I personally prefer line length 100 but not something I'm going to fight hard on.

I think I could do "How to implement test packages for unit tests". I had to do that for doppel-cli (https://github.com/jameslamb/doppel-cli/tree/master/integration_tests/test-packages/python)

jameslamb avatar Feb 21 '20 04:02 jameslamb

I like pyvis for consistency with the R package. Then we can live or die with vis.js, and maybe even discover things in it that we can fix or open issues for.

jameslamb avatar Feb 21 '20 04:02 jameslamb

On networkx vs igraph, does igraph have a very similar API in python and R? We don't do all too much with graph metrics at the moment, but if we ever do, maybe a similar API would make parity easier.

On Thu, Feb 20, 2020, 10:19 PM James Lamb [email protected] wrote:

I like pyvis for consistency with the R package. Then we can live or die with vis.js, and maybe even discover things in it that we can fix or open issues for.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uptake/pkgnet/issues/267?email_source=notifications&email_token=AF3FDS4HO35WFRWS4OZ4A7DRD5I5DA5CNFSM4KYI5JXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMROUPA#issuecomment-589490748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3FDS7MPU3LY2VZBNGV7T3RD5I5DANCNFSM4KYI5JXA .

bburns632 avatar Feb 21 '20 05:02 bburns632

I'm ok with all of the "standards to discuss" things. I personally prefer line length 100 but not something I'm going to fight hard on.

100 length sounds good to me.

I personally prefer HTML + jinja2. jupyter and nbconvert are ubiquitous, but also kind of heavy for what we want to do.

I agree. I think this is right level to make that tradeoff for what we need.

I like pyvis for consistency with the R package. Then we can live or die with vis.js, and maybe even discover things in it that we can fix or open issues for.

I'm thinking about putting the viz backend in a separate class to make it easier to replace it later. (This was also something I was thinking about doing for R.) I'll start with pyvis.

On networkx vs igraph, does igraph have a very similar API in python and R? We don't do all too much with graph metrics at the moment, but if we ever do, maybe a similar API would make parity easier.

Sort of. It looks like the Python version has these enormous graph classes with a bajillion methods. The R implementation was completely functional. In that way the APIs are pretty different. The parameters are similar though, probably because everything uses the same C++ underneath. Here is an example:

  • https://igraph.org/python/doc/igraph.GraphBase-class.html#betweenness
  • https://igraph.org/r/doc/centr_betw.html

Something to consider:

  • https://github.com/networkx/networkx 6.9k stars
  • https://github.com/igraph/python-igraph 591 stars

jayqi avatar Feb 21 '20 05:02 jayqi

So the really annoying thing, and maybe deal-breaker, with pyvis is that they don't actually output code you can easily embed in an HTML doc. They have their own HTML document template where they use jinja2 to inject data into the raw javascript code to draw the graph.

https://github.com/WestHealth/pyvis/blob/master/pyvis/templates/template.html

In order to use this, we'd have to copy-paste their javascript code block into our templates.

jayqi avatar Feb 24 '20 01:02 jayqi

I'm thinking now that it may be most straightforward to just write the visualization code directly in javascript, and just inject the network data as json. This might actually be the most flexible, because we don't have to worry about existence or support of a Python interface. Which don't seen to exist in general anyways. Tt doesn't seem like there are as many good options to have Python generate the javascript code -- there isn't the same htmlwidget ecosystem that R has, as far as I can tell.

jayqi avatar Feb 24 '20 04:02 jayqi

I'm thinking now that it may be most straightforward to just write the visualization code directly in javascript, and just inject the network data as json. This might actually be the most flexible, because we don't have to worry about existence or support of a Python interface. Which don't seen to exist in general anyways. Tt doesn't seem like there are as many good options to have Python generate the javascript code -- there isn't the same htmlwidget ecosystem that R has, as far as I can tell.

ok I like this! I think it makes a lot of sense, gives us a clearer path for "language 3", and (selfishly) is an opportunity for us to practice a little JS

jameslamb avatar Feb 24 '20 04:02 jameslamb

Making some progress!

screencapture-file-Users-Jay-Desktop-pkgnet-tmp-html-2020-02-23-22_21_52

jayqi avatar Feb 24 '20 06:02 jayqi

Not too much work to implement a different library. This is sigma.js. screencapture-file-Users-Jay-Desktop-pkgnet-tmp-html-2020-02-24-07_12_57

jayqi avatar Feb 24 '20 15:02 jayqi

So I've got a working implementation in place for the most part. Wondering if I should open a pull request. At some point it might be easier to start making issues for smaller pieces of stuff and tackling them individually.

jayqi avatar Mar 09 '20 05:03 jayqi

So I've got a working implementation in place for the most part. Wondering if I should open a pull request. At some point it might be easier to start making issues for smaller pieces of stuff and tackling them individually.

@jayqi I definitely think you should open a PR! I propose that you open the PR and we only focus on resolving comments related to the API, then worry about which viz library, how to handle the weird "package name you install is different from the one you import" thing, and other issues later.

jameslamb avatar Mar 10 '20 21:03 jameslamb

Hi @jayqi, thanks for your great work!

After successfully installing the package, I am trying to replicate your example but this does not work for me (Python 3.7.5 on Windows 10):

Python 3.7.5 (tags/v3.7.5:5c02a39a0b, Oct 15 2019, 00:11:34) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from pkgnet import create_package_report
>>> create_package_report("jinja2")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Nicolas\Documents\home\workspace\pkgnet\py-pkg\pkgnet\package_report.py", line 172, in create_package_report
    pkg_reporters = default_reporters()
  File "C:\Users\Nicolas\Documents\home\workspace\pkgnet\py-pkg\pkgnet\package_report.py", line 33, in default_reporters
    DependencyReporter(),
TypeError: Can't instantiate abstract class DependencyReporter with abstract methods network_measures

Any help would be greatly appreciated.

Thank you so much,

Nicolas

nicolasfguillaume avatar Sep 02 '20 08:09 nicolasfguillaume

@nicolasfguillaume Thanks for trying it out! Sorry that you had an error—it looks like I had a bug sneak in there. #281 should fix it.

jayqi avatar Sep 03 '20 04:09 jayqi

Now that CI is lower cost (in terms of time), I'd like to test what appetite there is to update the docs and publish to PyPy. I feel guilty we're let this sit for years.

LOE:

  • Name it!
  • New docs
  • New CI
  • Handful of new issues after release

How about it @jayqi @jameslamb?

bburns632 avatar Apr 19 '24 20:04 bburns632

I haven't touched my implementation nearly 4 years. I don't know if it's really ready for users (I vaguely recall some stuff was still kind of janky), and I'm not sure I want to commit to developing/maintaining it.

jayqi avatar May 03 '24 19:05 jayqi

+1 I can't to that, I can't commit to supporting the Python version.

In my opinion, you or @jayqi should do one of the following:

  • just delete that branch
  • publish the code to a separate public repo with permissive license then immediately archive that repo

jameslamb avatar May 03 '24 21:05 jameslamb