TableNet
TableNet copied to clipboard
Reasoning behind output shape
Is there a reason why the output (both for the column and the table mask) has 3 channels? The training is basically doing binary classification (which also means the mask does not need to be of dtype float32, but that's another matter) so we could work with either 2 output channels (those being the logits for "belongs to the table/column or not") or even just one channel (which modulo applying the sigmoid-function would give the probability that a pixel belongs to the table/column).
Since the labels contain only zeros and ones the third channel should get arbitrarily small values after training long enough anyway.