dfencoder
dfencoder copied to clipboard
Clarification on training inputs and outputs
The HLD is helpful, but what I think is still hard to understand is how you actually train the model, speak how inputs and outputs of the model look like that get used to calculate the loss.
Following the diagram and assuming one input of each type of feature (binary, numeric, categorical), you have 3 inputs looking like this:
[ 1
44
9 ]
These are the inputs of your model, which get initially transformed and concatenated into a vector with 8 features like:
[ 1 // binary
0.22 // rescaled
0.2 // rest is random embedding of emb size 6
-0.12
0.65
0.11
-0.96
-1.01 ]
As I understand it, you use an autoencoder to reconstruct the concatenated layer, e.g. you minimize the reconstruction error between the concatenated and the output layer, both with 8 features. But doesn't this completely ignore the training of the embedding?
Would you be so kind to give a simple example, which errors you minimize between which inputs and outputs (concrete values are also fine!)
Thanks a lot!