hyperformer
hyperformer copied to clipboard
What is the strategy for initializing the task_embedding, layer_id_embeddings, and adapters_block_type embeddings?
@rabeehk It seems all these embeddings are initialized from a pytorch default gassian normal distribution with N(0, 1).