ChatterBot icon indicating copy to clipboard operation
ChatterBot copied to clipboard

suggestion for implementing deep learning in chatterbot

Open brightening-eyes opened this issue 7 years ago • 13 comments

hi, since chatterbot doesn't support deep learning, i will propose my suggestions to make it better (since my deep learning library is mxnet and the gluon api, i will focusing on giving examples based on that)

  1. instead of a database storage adapter, we can make a class like mxnetAdapter which gets a model class in it's **kwargs argument, and a file to load the model if the file exists, it should load it otherwise initialize the model using self.model.initialize() it can also have a ctx in the **kwargs indicating if it should be initialized on the cpu or gpu the storage adapter doesn't have drop(), count() etc methods, since it will be the trained model it can
  2. accept a loss in it's constructor's **kwargs variable
  3. the loss can be in the trainer class (then it won't be possible to learn and the robot should be in read_only mode) in storage adapter's update() method, we should do the forward pass and back propogation (if we have the loss), otherwise we shouldn't do anything and, at last, we should pass the statement.text as onehot input to the model for the trainer:
  4. in the train(), it should accept x, y, batch_size, epochs, etc
  5. the trainer should use the storage adapter's model and the loss to do the forward and backward passes I think by using this approach, we can have everything (logic adapters, comparison functions etc have a nice time.

brightening-eyes avatar Feb 23 '19 11:02 brightening-eyes

Exciting, curious to see, if possible could you please share with an example?

vkosuri avatar Feb 26 '19 12:02 vkosuri

hi, since database adapters return the next responce from different sql statements, this is a bit different with neural networks in neural networks when we want to train, we have an x and y. the x is what user says to bot, while in y we store the answer when trained, the model learns to return different responces based on what user said (of course based on the training data, and very very accurate than sql statements). so, when the user says "hello", the robot will return "hi, how are you?" based on the training data. but, when trained with a larger data like ubuntu corpus etc, it can learn to interract better (if more parameters like context, previous statements) are added, it will be better in terms of accuracy (although it will require lots of data to train). about the example, i don't know when i'm going to write it. but, we have a self.model, a self.loss in the adapter, a mxnet.gluon.Trainer in the trainer, and maybe a preprocessor to transform the statements into one-hot representation. or, they can be transformed into one-hot representation in the adapter also, mxnet.metric.Accuracy can be used in the trainer class, to determine how the model performs during the training and possibly, validation process.

brightening-eyes avatar Feb 26 '19 20:02 brightening-eyes

Hi @brightening-eyes, I think this is a great suggestion and this would be an awesome addition to ChatterBot. A decent amount of research and testing would but I think it would be worth the effort.

gunthercox avatar Mar 23 '19 14:03 gunthercox

This is a great idea but since deep learning is now involved what will this mean for low power IoT devices currently running chatterbot ?

ignertic avatar Apr 01 '19 23:04 ignertic

Hi @ignertic , With the above mentioned approach you can train your model in a powerful machine then ship the software with the pretrained model (which is a flat-file) to your IoT device and it will work as before. :)

/Gabor

Orfeous avatar Apr 02 '19 21:04 Orfeous

@Orfeous , yes of course :) Just curious though, any progress with this ?

ignertic avatar Apr 05 '19 11:04 ignertic

about deep learning, first, it depends on your model (basicly a seq2seq model) which is slow even on cpu regarding the training process it depends on your data, again the model, the loss function, if it will learn from new conversations or not (if yes, it will be slower since we need more training).

brightening-eyes avatar Apr 06 '19 22:04 brightening-eyes

I think, it would be more beneficial to implement it as a logic adapter so it steps in when it can provide better answer than other adapters. In this way we can keep our database to log the conversations and later we can reuse these logs as training data for our model.

The learn by chatting functinailty can be achieved eather by running the training procedure periodicly in the background, or manually as a maintenance todo for bot operators.

Orfeous avatar Apr 08 '19 10:04 Orfeous

regarding logic adapters, it has some pros + some other cons pros:

  • it can be used in conjunction with other adapters
  • when other adapters give responce, it won't get called, but this is less accurate cons:
  • training should be implemented differently
  • continuous learning will not be possible, or requires more coding and it will be less stable than the StorageAdapters
  • we won't get rid of sql statements, and if when they aren't needed, they will be executed. so, less performance.
  • it won't be possible to train that by using the default trainers (as stated, it requires training the model different either without using Trainer class, or using a pretrained model.

also, check this out

brightening-eyes avatar Apr 08 '19 16:04 brightening-eyes

Any progress on that?

Mo7mud avatar Dec 25 '21 01:12 Mo7mud

it seems that chatterbot is not maintained anymore another, but better idea which came to my mind is to have a model containing an embedding layer and some sort of text similarity detection (to make BestMatch adapter better without something like spacy and so) also a class for transformation of textual data to features for the model and passing a custom model can be proposed to make training that model separate from training chatterbot with this way, the framework used to train the model (tensorflow, pytorch, mxnet) can be used as the preference of the user

brightening-eyes avatar Dec 25 '21 15:12 brightening-eyes

it seems that chatterbot is not maintained anymore

I agree. However as I think about this, it seems to me that implementing an ML model isn't big of a deal. (you just throw in a custom logic adapter which then implements a huggingface transformer or even a GPT-3 API client).

The real trick is, the evaluation to decide which logic adapter answer should be returned, and also remain consistent (as much as possible)

Orfeous avatar Dec 25 '21 16:12 Orfeous

the thing is, we can make bestMatch adapter to use deep learning (by using a custom comparison function) and get rid of things like spacy and so on. to generate the response, another logic adapter can be implemented, which should take an encoder and a decoder (seq2seq model) with attention mechanism added in order to generate the responses. but for bots that do custom things like getting weather or reserving hotels, the thing is somehow different with these models. the model for example should get the location and the time of reservation and generate the appropriate response. (these things require a lot of data).

brightening-eyes avatar Dec 25 '21 17:12 brightening-eyes