tapas
tapas copied to clipboard
Question about creation of column ranks
Hi,
Are the numeric column ranks created based on the original table, or after drop_rows_to_fit
? The reason I'm asking this is because in the Transformers library, some people are complaining about the creation of these column ranks, which are done based on the original table, which might exceed the vocab size of the column ranks (which is 256). Looking at the code, it seems that the original implementation also computes the column ranks on the original table, correct?
Kind regards,
Niels
IIRC, then we compute them before pruning the table. That was by design so that those ranks would match the original numeric rank (pre-pruning). It's true that the rank could thus exceed the vocab size. We could add some trimming to prevent that.
@NielsRogge Hi, when preparing WTQ training data for TAPAS, it seems I could not always get reasonable "answer_coordinates", and I also came across exceeding problems when creating the column ranks (I have to truncate it to 256). In case you could successfully process WTQ data for PyTorch version of TAPAS, is it possible to share some scripts to shed some light on it? Thanks! Regards, Ariel
Hi @arielsho, I did also find that the answer_coordinates
are not always reasonable. This is because the authors did convert all datasets to the SQA format using some automated scripts, as explained here. I only tested fine-tuning TAPAS on SQA.