lore save_weights fails when large number of input features are present

save_weights fails when large number of input features are present

Open Guzzii opened this issue 6 years ago • 4 comments

Hi @montanalow . This is really a great work. I really like how you abstract the common pitfalls in machine learning and streamline the process in this project. I see a lot of potential in this project from a data scientist perspective. If you don't mind, I can provide my feedback from using this tool.

For this particular issue, I encountered h5py error because of too many Input layers. As show here, we have to pass one encoder for each column in the dataframe, and each encoder corresponds to one Input layer. I deal with a lot of DNA sequence data which is usually >5000 columns. I think it makes sense to at least combine the columns using Continuous or Pass encoders into one Input.

Jul 20 '18 15:07 Guzzii

@Guzzii This is something we've run into internally as well. The current work around is to set short_names = True, which will get you to hundreds, but probably not thousands of inputs.

What if encoders that share a common base name, followed by a number, e.g. 'sequence_1', 'sequence_2', 'sequence_3', ... 'sequence_n' were mapped into a single input of 'sequence' with shape(n), for all types where that is possible?

Jul 20 '18 17:07 montanalow

Hi montanalow. I think it makes sense. Just want to make sure if I understand correctly. In this case, it would aggregate columns with shared base name sequence_col_{}, and encoder generated input like one_hot_{}, respectively.

sequence_col_{} -> sequence (input_shape=n_1)
one_hot_{} -> one_hot (input_shape=n_2)

Jul 24 '18 14:07 Guzzii

Correct. I think there will be a little bit of complexity around encoders that have a sequence_length like the Token encoder, because they will need to go to a 2D shaped input, but should still work in theory.

Jul 24 '18 19:07 montanalow

Fraud inside this system

Jun 23 '22 19:06 metatron1973

lore lore copied to clipboard

save_weights fails when large number of input features are present

lore
lore copied to clipboard