Arraymancer icon indicating copy to clipboard operation
Arraymancer copied to clipboard

Add NLP dataset + NLP example

Open mratsim opened this issue 5 years ago • 3 comments

We have an embedding layer (#312), we have GRU with sequence support (#283).

We miss a dataset and an NLP example. The IMDB dataset is probably the one to have first: http://ai.stanford.edu/~amaas/data/sentiment/

Alternatively, we can use character level RNN instead of word level RNN which avoids the tokenizer issue (#316).

mratsim avatar Nov 05 '18 23:11 mratsim

Hi Mamy~!

What kind of example are you looking for? I'm pretty interested in helping with this. Could you provide any more details on what you envision out of this?

Related, I made a naive hashing vectorizer implementation for a nim demo at work - might also be somewhat related - https://github.com/metasyn/nim-vectorizer-splunk/tree/master/src - of course, using arraymancer.

metasyn avatar Nov 15 '18 04:11 metasyn

It can be Sentiment analysis on imdb (positive/negative) like https://www.kaggle.com/c/word2vec-nlp-tutorial.

Or for example author of short snippet detection: https://www.kaggle.com/c/spooky-author-identification.

I.e. something short, ideally the tokenizer can just be splitWhitespace.

On the tasks to implement this:

  • [x] Implement a high level interface to https://github.com/mratsim/Arraymancer/blob/master/src/nn_primitives/nnp_embedding.nim in nn folder
  • [ ] Add this high level interface to the DSL in nn_dsl
  • [x] Find an intereting dataset and add a downloader to it
  • [ ] Add an example

mratsim avatar Nov 15 '18 08:11 mratsim

Dataset + Downloader = https://github.com/mratsim/Arraymancer/pull/317

metasyn avatar Nov 18 '18 02:11 metasyn