document_cluster IPython notebook stuff just gets in the way?

IPython notebook stuff just gets in the way?

Open JesseAldridge opened this issue 9 years ago • 3 comments

It took me a while to figure out what an ipython notebook was and how to open it and run the code.

Then I tried to create a pull request for my earlier issue, but when I opened the notebook in jupyter it apparently upgraded the notebook file to a more recent format, so my diff would have been huge.

I wanted to use this project in my own code, but it looks like I have to copy and paste snippets out of the notebook in order to do that?

I guess it's kind of cool to have that literate programming style, but mostly the notebook stuff just seems to get in the way and make life difficult. If you got rid of it and just had normal python code, it seems like this project would be significantly easier to work with.

Apr 11 '16 05:04 JesseAldridge

Once you are in a notebook, you can actually save it as a python file. That will contain the code only.

You can also use nbconvert to convert the notebook into a number of different file formats.

Apr 11 '16 08:04 jdejoode

I found download as -> python. Is that what you mean? Thanks, that's a step in the right direction. But there are still problems like "In [*]" comments scattered all over the code and imports in the middle of the file. This code was clearly meant to be run in the ipython notebook format and would need to be refactored to work outside of that environment. I will maybe do it at some point if I find the time.

Apr 12 '16 05:04 JesseAldridge

Ok, I started on this: https://github.com/JesseAldridge/document_cluster

Note I found a big speed-up by by caching the stemming like so:

def cached_stem(t, cache={}):
  if t not in cache:
    cache[t] = stemmer.stem(t)
  return cache[t]

Apr 12 '16 07:04 JesseAldridge

document_cluster document_cluster copied to clipboard

IPython notebook stuff just gets in the way?

document_cluster
document_cluster copied to clipboard