五三模拟比英语周报更适合垫锅
Results
2
issues of
五三模拟比英语周报更适合垫锅
https://github.com/karpathy/minGPT/blob/4050db60409b5bbaaa3302cee1e49847fc145c65/mingpt/model.py#L62 and referred from http://jalammar.github.io/illustrated-gpt2/ I am remain confused about the definitions of `B, T, C = x.size()`. vocabulary length, batch_size, and tokenizer size, etc? Thanks.
It seems the results u and v from SVD given by ` region_params['u'] = u` `region_params['d'] = d` from region_generator.py does not participated with the training process?