Convert-PolyAI-Torch
Convert-PolyAI-Torch copied to clipboard
really not clear from paper: 'computed as cosine similarity with annealing between the encodings hx and hy. It starts at 1 and ends atp d, linearly increasing over the first...
probs want no biases, stop model bloating
need to check ive implemented label smoothing with how authors how they label smoothed their objective sampling as objective fn includes negative sampling.
currently using torch gelu. fast gelu in paper
implement BPE from scratch with unk tokens hashed (although may achieve worse results on downstream tasks) as # perhaps not as general as bpemb's 25000.model