Using alternative sequence lengths for SQuAD-based models in the Open division
For BERT Large, we tokenize the SQuAD v1.1 dataset into sequences of up to 384 symbols. For smaller models such as BERT Base, the sequence length of 128 is often used. Would it be allowed to tokenize the dataset into shorter sequences for submissions to the Open division?
@rnaidu02 Let us discuss this in the first WGM in 2023.
No disagreement with the proposal from the 1/3/2023 IWG meeting
I think that this idea breaks uniformity and should not be in the open division.
By reducing sequence length, you get a "free" speedup and bypass one of the main problems of transformers, which is the attention mechanism (time complexity of 384^2 >> 128^2 for example).
In addition to that, this makes "advancements" and comparisons with previous mlperf submissions irrelevant as it is not an apples-to-apples comparison if we compare performance of seq_len 384 with seq_len 128 for example.
From the 1/10/2023 IWG meeting, it is determined that seq is part of the benchmark definition (seq length of 384).