BertGCN
BertGCN copied to clipboard
Computation Requirements
We have 2.8M data. Would it be feasible to train your architecture on this large data? If yes, how much memory requirements and time would be needed for it approx?
it might be difficult to implement full graph training on large dataset, but our method is orthogonal to graph sampling methods and can be integrated into them. With graph sampling methods the memory cost should be close to ordinary bert models