tapas icon indicating copy to clipboard operation
tapas copied to clipboard

Question about creation of column ranks

Open NielsRogge opened this issue 3 years ago • 3 comments

Hi,

Are the numeric column ranks created based on the original table, or after drop_rows_to_fit? The reason I'm asking this is because in the Transformers library, some people are complaining about the creation of these column ranks, which are done based on the original table, which might exceed the vocab size of the column ranks (which is 256). Looking at the code, it seems that the original implementation also computes the column ranks on the original table, correct?

Kind regards,

Niels

NielsRogge avatar Feb 09 '21 13:02 NielsRogge

IIRC, then we compute them before pruning the table. That was by design so that those ranks would match the original numeric rank (pre-pruning). It's true that the rank could thus exceed the vocab size. We could add some trimming to prevent that.

ghost avatar Feb 09 '21 14:02 ghost

@NielsRogge Hi, when preparing WTQ training data for TAPAS, it seems I could not always get reasonable "answer_coordinates", and I also came across exceeding problems when creating the column ranks (I have to truncate it to 256). In case you could successfully process WTQ data for PyTorch version of TAPAS, is it possible to share some scripts to shed some light on it? Thanks! Regards, Ariel

arielsho avatar Mar 08 '21 04:03 arielsho

Hi @arielsho, I did also find that the answer_coordinates are not always reasonable. This is because the authors did convert all datasets to the SQA format using some automated scripts, as explained here. I only tested fine-tuning TAPAS on SQA.

NielsRogge avatar Mar 08 '21 09:03 NielsRogge