Michael Heilman comments

Results 20 comments of


                                            Michael Heilman

tokenization issues for non-ascii texts

indeed.

tokenization issues for non-ascii texts

probably a lot of them

tokenization issues for non-ascii texts

I guess another option would be to use unidecode...

tokenization issues for non-ascii texts

Hmm, I think the simpler dictionary approach sounds good, but I think that dict above is missing a few things (http://en.wikipedia.org/wiki/Apostrophe).

need more specific use of logging

We should probably use this: https://lukasa.co.uk/2014/05/A_Brief_Digression_About_Logging/ (via @dan-blanchard)

collapse_rst_labels.py could be cleaned up a bit

Note that there is some inconsistency with escaping `-` in the regular expressions, but python deals with this fine. e.g., ``` elif re.search(r'^(topic-.*)', relation_lc): ``` vs. ``` elif re.search(r'^(temporal\-.*|sequence|inverted\-sequence)', ```

parsing evaluation metrics

Commit 12c5b59 implements the basic functionality for doing parseval, but it's not complete. Some edge cases still need to be dealt with (e.g., same-unit relations). See the TODO comments in...

parsing evaluation metrics

The paper about the HILDA system (http://dad.uni-bielefeld.de/index.php/dad/article/viewFile/591/1187) says to see Marcu, 2000, 143–144 for a discussion of how PARSEVAL was adapted. (I'm still waiting to get the book from interlibrary...

LSTM model equations

Thanks for your very detailed reply! I'll let you know if I find anything else useful related to this.

LSTM model equations

That's a very useful reference. Thanks!