first-pass-optimization-of-pico-scale-generative-model-0001
From #212 (Optimizations to proof of concept generative model)
- [x] Manual first pass optimizations of the projection layers after the embedding. (e.g. parameterize it)
- [x] Fix logic for handling text samples with no token
- [x] Add more training data
- [x] load more text samples and capture the curve between samples and perplexity