TableNet Reasoning behind output shape

Reasoning behind output shape

Open schoennenbeck opened this issue 3 years ago • 0 comments

Is there a reason why the output (both for the column and the table mask) has 3 channels? The training is basically doing binary classification (which also means the mask does not need to be of dtype float32, but that's another matter) so we could work with either 2 output channels (those being the logits for "belongs to the table/column or not") or even just one channel (which modulo applying the sigmoid-function would give the probability that a pixel belongs to the table/column).

Since the labels contain only zeros and ones the third channel should get arbitrarily small values after training long enough anyway.

Aug 09 '21 12:08 schoennenbeck

TableNet TableNet copied to clipboard

Reasoning behind output shape

TableNet
TableNet copied to clipboard