hierarchical-attention-networks
                                
                                
                                
                                    hierarchical-attention-networks copied to clipboard
                            
                            
                            
                        Embeddings for special tokens/padding?
I was wondering where in the code you are initializing the embeddings for the special tokens in the vocabulary (like unknown and padding words) - shouldn't these be set to zero-embeddings and excluded from training? Or how are your dealing with these?