lore
lore copied to clipboard
save_weights fails when large number of input features are present
Hi @montanalow . This is really a great work. I really like how you abstract the common pitfalls in machine learning and streamline the process in this project. I see a lot of potential in this project from a data scientist perspective. If you don't mind, I can provide my feedback from using this tool.
For this particular issue, I encountered h5py
error because of too many Input
layers. As show here, we have to pass one encoder for each column in the dataframe, and each encoder corresponds to one Input
layer. I deal with a lot of DNA sequence data which is usually >5000 columns. I think it makes sense to at least combine the columns using Continuous
or Pass
encoders into one Input
.
@Guzzii This is something we've run into internally as well. The current work around is to set short_names = True
, which will get you to hundreds, but probably not thousands of inputs.
What if encoders that share a common base name, followed by a number, e.g. 'sequence_1', 'sequence_2', 'sequence_3', ... 'sequence_n'
were mapped into a single input of 'sequence'
with shape(n), for all types where that is possible?
Hi montanalow. I think it makes sense. Just want to make sure if I understand correctly. In this case, it would aggregate columns with shared base name sequence_col_{}
, and encoder generated input like one_hot_{}
, respectively.
sequence_col_{} -> sequence (input_shape=n_1)
one_hot_{} -> one_hot (input_shape=n_2)
Correct. I think there will be a little bit of complexity around encoders that have a sequence_length
like the Token
encoder, because they will need to go to a 2D shaped input, but should still work in theory.
Fraud inside this system