tutorials icon indicating copy to clipboard operation
tutorials copied to clipboard

check my first version of rotten-tomatoes analysis

Open behrica opened this issue 5 years ago • 9 comments

I added a first version, just to see how it goes. Its in it my fork here: https://github.com/behrica/tutorials/tree/master/src/drafts

I added as well the data files, as needed.

So you should be able to "run" it by using "jupyter notebook". Please try it.

All dependencies are added dynamically in the beginning of the notebook. I was not sureif to commit it with all result cells "empty", or not.

I am not sure how in details github handles ipython files regarding images and embedded html... And indeed, the vega-lite plots don't appear.

So I though as well that we should publish a html version as well. It looks better, as the plots are shown. This needed to be in a "githhub site", I put it in my personal for the time being:

https://behrica.github.io/tutorials/rotten-tomatoes-sentimen-analysis/sentiment-analysis-rotten-tomatoes.html

Please provide me with your comments

behrica avatar Mar 03 '19 11:03 behrica

I just saw that github provides a little button inside the rendered notebook file, which says "render via external nbviewer" and that renders it in full, including the vega plots. That's great, so we don't need the html version.

behrica avatar Mar 03 '19 11:03 behrica

Wonderful, @behrica! That is so nice.

It works at my machine after installing latest version of clojupyter ("0.2.1-SNAPSHOT").

Imho, we should rather not put in the repo large data files (or any large files). It makes the whole git experience slower (and deleting the files does not remove them from the history). Probably it would be better to put a script (or clojure function) that brings the data from its source, like @cnuernber did here: https://github.com/cnuernber/ames-house-prices/tree/master/scripts For safety, we can add the data subdirectory to .gitignore. Seems reasonable?

Regarding large rendered notebooks, we need to think of a solution. Maybe having them all rendered in one repo would be too heavy.

daslu avatar Mar 03 '19 18:03 daslu

Ok, maybe to added to the guideline: Please don't put data files

I will find a way to change my notebook to download the data.

Regarding rendered notebooks...(html or others) Yes, they can become big, with plots and if using Vega the data is always part of the javascript for Vega, so can become big as well. Not sure, what to do with them.

For me the ipynb file is already big, as the Vega stuff puts the data is in there all the time. unless I "clear" actively all cells before the last save...

That's a Vega specific thing and can be avoided by saving the data to file first and then the Vega points to it...Not ideal for interactive working.

There are "tons" of possibilities on were people might want to have their rendered html files. Maybe we leaf it to them, and just allow a "link" in the table.

behrica avatar Mar 04 '19 15:03 behrica

You just hit one of the major limitations of notebooks: versioning simply doesn't work as intended with them. I guess we have to live with that if we want to show off stuff directly from GitHub and/or nbviewer (or https://mybinder.org/), but size shouldn't be an issue, there are very large repos with a lot notebooks and they work pretty well

alanmarazzi avatar Mar 04 '19 17:03 alanmarazzi

Thanks @behrica , good idea, I added the guideline.

@alanmarazzi it is good to hear that you experience no problems with large notebooks (I remembered something different, but maybe it was an extreme case).

So let us, for now, keep working with notebooks rendered in git, and if and when we meet a problem, we can think what to do.

daslu avatar Mar 04 '19 22:03 daslu

This is a nice example of what is achievable in terms of tooling/presentation and size: https://github.com/jakevdp/PythonDataScienceHandbook

alanmarazzi avatar Mar 05 '19 08:03 alanmarazzi

You just hit one of the major limitations of notebooks: versioning simply doesn't work as intended with them. I guess we have to live with that if we want to show off stuff directly from GitHub and/or nbviewer (or https://mybinder.org/), but size shouldn't be an issue, there are very large repos with a lot notebooks and they work pretty well

Innocent question here: Was using gorilla-repl ever something you all considered for the tutorials? I've never used it and therefore only have a sketchy understanding of how it fits in, but I noticed that unlike .ipynb files, theirs save as something like normal clojure code. So versioning might work a bit better...

ezmiller avatar Apr 12 '19 22:04 ezmiller

Thanks @ezmiller, you're right, versioning would be better with text-based formats such as gorilla and org-mode.

I guess it is not a reason not to use Jupyter here, but indeed a limitation to keep in mind.

daslu avatar Apr 13 '19 04:04 daslu

Yes. Definitely not a reason to change anything on work done here by @behrica :). I was just curious if there'd been a discussion more generally about gorilla-repl. I think I'll take this dicussion into Zulip.

ezmiller avatar Apr 13 '19 13:04 ezmiller