DiDev
DiDev
Can OpenCL also support NVidia GPUs ?
Oops. That seems to be the reason. Btw, the default learning rate given for Adam seems to be really low. With around 0.001 it seems to be converging faster. That's...
Will keep that in mind! Is there any possibility to quantize with this model ? I'm curious on where we should make the necessary changes to make it use 8...
Okay one more question: Let's say I'm training it on one dataset.txt. After some time I stop the training and switch it to another dataset.txt. Is it possible to port...
Yes 8 bit Float instead of f32. Can you explain a little on embedding.dat, optimizer.dat, pos_embedding.dat and tensor_num.dat files. Where do the weights for the layers get stored and is...
Got it. If I could get the weights alone loadable so that we can switch the model around different data sets, I'll raise a PR.
@keyvank Solved the lack of interchangeability of weights between datasets issue by the simpler solution of limiting the vocab to a range of ascii characters. Unfortunately as you can guess,...
Awesome @keyvank !
I'm trying to use the ascii tokenizer but may be a bit of implementation is missing. Shouldn't the new Tokenizer trait be used instead of SimpleTokenizer everywhere ?
Understood. I haven't tried with larger models yet. I'm actually playing with even smaller ones than the default settings. The previous output I provided was with 4 layer 4 head...