BertSum
BertSum copied to clipboard
requirements for bert-large?
What if any issues would occur if bert-large was used? For example gpu requirements and training time? would it be too costly? Any reason why bert-base was used instead of bert-large?
I'm also guessing that Yang Liu used bert-base instead of bert-large because bert-large would require more gpu, memory, and training time. Maybe using bert-large wouldn't result in greater improvements in performance, but I don't think the original paper talks about that. There aren't ablation studies about this in particular, but just my guess.