neat-python icon indicating copy to clipboard operation
neat-python copied to clipboard

sequence to sequence plans

Open pax7 opened this issue 7 years ago • 10 comments

Hello,

Are there any plans to include a sequence to sequence example? Something chatbot like?

pax7 avatar Jun 08 '17 04:06 pax7

I would like to have something like that, I'm hoping I'll get some spare time to look into it soon. :)

CodeReclaimers avatar Jun 25 '17 16:06 CodeReclaimers

Hi - has anyone had a chance to look at this?

pax7 avatar Jul 25 '17 19:07 pax7

Well you need to be more specific what do you mean like seq to seq. Does it mean comparing two wave functions ? (not trying to be complicated)

evolvingfridge avatar Jul 28 '17 11:07 evolvingfridge

I don't know enough about natural language processing to take a reasonable stab at a chatbot using NEAT (other than just doing some kind of character-by-character predictor, which I've never tried in any fashion), so I was thinking about trying one of these: https://gym.openai.com/envs#algorithmic

CodeReclaimers avatar Jul 28 '17 13:07 CodeReclaimers

I agree with starting with the openai tasks. For full natural language processing, I suspect that preprocessing with a POS (parts of speech) tagger and a stemmer (trying -> try, for instance) will be needed for any sane size of network (unless @stark7 is working at Google Brain ;-}). What I'm meaning is something like the past X terms being fed in as their POS and a set of bits for whether the stemmed form is identical to that of another of the X terms, with the output being a POS and a reference to one of the stemmed forms, to be put together into the appropriate form of the word. Alternatively, locating a deep-learning network/autoencoder that has already learned useful ways of representing words/phrases, and using those as input and output, may be preferable.

@d0pa: Comparing wave functions (or comparing multiple, as opposed to outputting an altered version of, sequences of letters/DNA bases/protein) is an interesting problem - alignment of such and the more general problem of recognizing similar sequences is a major interest in bioinformatics and a number of other fields (including, as I mentioned on another thread, input space classification for measuring novelty).

drallensmith avatar Jul 28 '17 19:07 drallensmith

@drallensmith, I am working on such problem, but it gets complicated even to prove to my self, that my my work is useful what so ever. It's very time consuming to reproduce/understand similar experiments also there is trade-off in error rate versus computational burden, additionally understanding or setting up new real world like experiments, requires understanding of problem domain itself (hardest by far). Instead of working with OpenAI gym I find it more useful to work with UCR Archive.

evolvingfridge avatar Aug 13 '17 05:08 evolvingfridge

@d0pa: It would be nice on the UCR Archive if, perhaps by a link from the first author's name, they gave direct access to the paper they want everyone to read... Does the archive zip file contain a listing of what each sub-dataset is (and in particular the problem domain involved)?

drallensmith avatar Aug 13 '17 17:08 drallensmith

@d0pa: BTW, one way that I keep myself motivated - and one reason that I've wound up doing research on computers as opposed to in a "wet lab" - is by trying to do things that are both needed for the research long-term and useful (to others) short-term. My work on neat-python is an example of this!

drallensmith avatar Aug 13 '17 17:08 drallensmith

@drallensmith Sorry is my fault, I should have provided newer link for UCR Archive, I think this will answer your questions.

evolvingfridge avatar Aug 13 '17 20:08 evolvingfridge

Thanks! Don't worry about it - they really should have put a (prominent) link to the new page on the old one...

drallensmith avatar Aug 15 '17 01:08 drallensmith