scidata
scidata copied to clipboard
Make IMDB Reviews dataset consistent
Currently they return a map and still support transforms. Is it possible to normalize them to a similar result as the other datasets? For example, return a tuple of {{input_binary, input_type, input_shape}, {label_binary, label_type, label_shape}}
?
Thanks for flagging! I'll open a PR to get of the transforms.
As for normalizing, we could truncate or pad the binaries to make them uniform in shape (as @seanmor5 suggested). That way we can provide an input_shape
that describes the data exactly. Otherwise, we could make the input_shape
something like {25000, nil}
to indicate the length of each binary varies. Or do you have another suggestion?
We may want to return the map that IMDB.download/1
current returns in a new IMDB.to_columns/1
function for use with Explorer
, similar to Squad.to_columns/1
.
I see… Hrm. It would be nice to see how this dataset would be used with Axon or Explorer then before making a decision.