Results 76 comments of Dean Kayton

Any plans to do the same for a row ID (as apposed to a column ID, aka Header)?

I see the same. Ubuntu 12.04 (Default Unity DE) and 16.04 (Ubuntu Mate DE). Atom 1.80 tablr 1.3.1 (default settings)

Currently the above example would look like this: ``` Here is a sentence. Here is a second sentence. This paragraph doesn't have a paragraph heading. History and infrastructure. Here is...

I'm contemplating using some of the parsing tips here: https://towardsdatascience.com/wikipedia-data-science-working-with-the-worlds-largest-encyclopedia-c08efbac5f5c, to simply record ids that match the categories I care about. Then use wikiextractor (as a subprocess) and only save...

> > I will try link to a Jupyter Notebook with this working. Please let me know if my thinking is flawed. > *Edited it to include output*. You can...

If useful to easily install the wiki dumps that worked with the above code, see https://gist.github.com/dnk8n/afcd8585865fa29abe625e8ecee94c68

I was nearly going to help out with a fix until I saw that parsing is happening in regex. Why not parse with an XML parser? Since it is XML......

Funnily enough, I fell into exactly the same trap while writing my own parsing of the wiki files... about to work it out. I don't use regex though, I use...

Can confirm that the lambda function would have returned an error (e.g. NLTK_DATA=/cache/nltk) if the supplied nltk_data directory was read-only. It would have been of the following format: ``` Traceback...

The results of the following: ```python import os import stat statdata = os.stat(os.environ['NLTK_DATA']) perm = stat.S_IMODE(statdata.st_mode) # is it world-writable? if perm & 0o002: print("It is world-writable") else: print("It is...