stata_kernel icon indicating copy to clipboard operation
stata_kernel copied to clipboard

Shout-out from Stata Corp.

Open amichuda opened this issue 4 years ago • 7 comments

Stata 17 has the pystata package which lets users run Stata from python. Guess who they acknowledged?!

https://www.stata.com/python/pystata/ack.html

I think the package is closed source, so they didn't really follow the spirit of your package, but still pretty cool!

Once again, really great job on this package, from what I've seen in my research and other institutions (at least in economic development work), the stata kernel has made a splash!

amichuda avatar Apr 21 '21 18:04 amichuda

Well, if it actually used any of the code here they'd have to publish it, right? I'm assuming they must have re-done it from scratch.

mcaceresb avatar Apr 21 '21 18:04 mcaceresb

Yes, I think that's why they used the word "inspired," because otherwise I think you'd have a lawsuit against then if they used your code in a closed source software right (not a lawyer, so have no idea).

But not even sure how the software is being handled.

amichuda avatar Apr 21 '21 18:04 amichuda

This is interesting. Some thoughts:

  • Yeah probably not using any of our code. Don't think they'd be dumb enough to include a GPL3 dependency in their code...

  • It's hard to know much about their code because only their stata_setup module is public

  • Because it's not pip-installable, you need to tell Python users to change their PYTHONPATH every time so they can import the pacakge. I'm sure there'll be a ton of support requests of Python not finding the pystata import

  • It doesn't define a Jupyter kernel. Instead it uses plain Python and just defines a few IPython magics. But the code is all running inside the regular Python kernel.

  • Since it doesn't define its own kernel, I'm curious if they're able to maintain data state on the Stata side. From stata_setup and their example, it looks like they maintain a running Stata session. Is it in a subprocess? How does Stata keep data in sync on the Python and Stata sides? Seems like it would be a pain in the butt for users to have to keep track of "is my Python data the same as my Stata data"

    image

    sys.path.append(os.path.join(path, 'utilities'))
    from pystata import config 
    config.init(edition)
    

    I assume data isn't persisted on the Python side and sent to Stata every time a Stata command is called... That would be a horribly slow experience for large data.

  • It's curious that they're integrating so much with Python... I imagine (hope) there will be some users for whom this introduces them more to Python's huge data science ecosystem, and then say "why am I paying so much for Stata, when I see I can do everything I need here in Python". But clearly StataCorp thinks this integration will be positive for them 🤷‍♂️

kylebarron avatar Apr 21 '21 19:04 kylebarron

@kylebarron I would assume it's using the existing python interface they introduced in Stata 16? My assumption is that the python data and Stata data are separate; I don't see how else it could work, at least from skimming the docs. My assumption is:

  • There is a persistent Stata session.
  • Data created in Stata stays there until imported to python.
  • Data created in python stays there until imported to Stata.

They might use frames to cache some data but I can't imagine they by default copy every data created in python into Stata and the converse (i.e. without the user telling the kernel to do it).

mcaceresb avatar Apr 21 '21 22:04 mcaceresb

  • It's curious that they're integrating so much with Python... I imagine (hope) there will be some users for whom this introduces them more to Python's huge data science ecosystem, and then say "why am I paying so much for Stata, when I see I can do everything I need here in Python". But clearly StataCorp thinks this integration will be positive for them man_shrugging

For their core demo, which is social scientists with high switching costs, I don't know this will make such a big difference either way. I assume Stata is betting that this will encourage enough newcomers to stick around. At least the ones that don't might, as you say, get exposed to Python instead of unhappily languishing in Stata.

mcaceresb avatar Apr 21 '21 23:04 mcaceresb

I have this on order and will report back on how things are happening. Two comments unrelated to the inner workings of the Stata Corp python module:

  1. The new pricing model which requires annual subscriptions is ridiculously expensive
  2. In my own research I never use Stata (use tensorflow and jax) but for the models I teach in an upper level econometrics class, statsmodels isn't there yet and R is too clunky as every Model we cover has a different API which isn't going to work for my students. Since all of my colleagues use and teach with Stata, it would be unfair to students for me to force another workfow on them, although with the new prices I am having a difficult time seeing how my university can pay for all of these subscriptions.

roblem avatar Apr 23 '21 13:04 roblem

Have been testing this out this morning (on linux) having just upgraded to Stata 17. Observations:

  1. No syntax highlighting (although fenced stata codeblocks in markdown cells are highlighted)
  2. No completions
  3. Stata must be running as a background process since variables and the dataset exist across codeblocks, although a ps -ef | grep stata doesn't show anything.
  4. Copying python data into stata using %%stata -d some_dataframe_from_python creates a static copy of the python object/data that is not updated if the underlying python data changes.

The only advantage of the stata corp way is the mixing of stata and python in a single notebook, which I don't believe is possible with stata_kernel.

roblem avatar Apr 30 '21 08:04 roblem