DiDev comments

Results 35 comments of


                                            DiDev

GPU training support

Can OpenCL also support NVidia GPUs ?

Random discussions

Oops. That seems to be the reason. Btw, the default learning rate given for Adam seems to be really low. With around 0.001 it seems to be converging faster. That's...

Random discussions

Will keep that in mind! Is there any possibility to quantize with this model ? I'm curious on where we should make the necessary changes to make it use 8...

Random discussions

Okay one more question: Let's say I'm training it on one dataset.txt. After some time I stop the training and switch it to another dataset.txt. Is it possible to port...

Random discussions

Yes 8 bit Float instead of f32. Can you explain a little on embedding.dat, optimizer.dat, pos_embedding.dat and tensor_num.dat files. Where do the weights for the layers get stored and is...

Random discussions

Got it. If I could get the weights alone loadable so that we can switch the model around different data sets, I'll raise a PR.

Random discussions

@keyvank Solved the lack of interchangeability of weights between datasets issue by the simpler solution of limiting the vocab to a range of ascii characters. Unfortunately as you can guess,...

Random discussions

Awesome @keyvank !

Random discussions

I'm trying to use the ascii tokenizer but may be a bit of implementation is missing. Shouldn't the new Tokenizer trait be used instead of SimpleTokenizer everywhere ?

Random discussions

Understood. I haven't tried with larger models yet. I'm actually playing with even smaller ones than the default settings. The previous output I provided was with 4 layer 4 head...