nprintml
nprintml copied to clipboard
Train / Test dirs
It occurs to me that many public datasets have predefined training and testing splits for comparison purposes. We need the ability to supply a --train_dir and --test_dir or a train.pcap and test.pcap for this purpose, as right now we split the data randomly that we get.
Sure, that could work, (and I've done it that way in the past).
Alternatively, though this might overload things slightly, it might be easier (for the user and the implementation), to identify test files via the labeling file….
That's a good idea! Did not think of that. likely easier in the end. Add a column to the label file?
Right. We can make it optional, even.
And perhaps make it smart-ish (though we needn't) – say, if some rows are marked "test", then we know which are test and which train, (and same for just some marked "train" and the rest left blank). But, if some are marked "test" and some "train" and there are any unmarked, we error. (Alternatively we make it less smart, and/or make this column boolean.)