nprintml icon indicating copy to clipboard operation
nprintml copied to clipboard

Train / Test dirs

Open JordanHolland opened this issue 4 years ago • 3 comments

It occurs to me that many public datasets have predefined training and testing splits for comparison purposes. We need the ability to supply a --train_dir and --test_dir or a train.pcap and test.pcap for this purpose, as right now we split the data randomly that we get.

JordanHolland avatar Jan 13 '21 18:01 JordanHolland

Sure, that could work, (and I've done it that way in the past).

Alternatively, though this might overload things slightly, it might be easier (for the user and the implementation), to identify test files via the labeling file….

jesteria avatar Jan 13 '21 20:01 jesteria

That's a good idea! Did not think of that. likely easier in the end. Add a column to the label file?

JordanHolland avatar Jan 13 '21 21:01 JordanHolland

Right. We can make it optional, even.

And perhaps make it smart-ish (though we needn't) – say, if some rows are marked "test", then we know which are test and which train, (and same for just some marked "train" and the rest left blank). But, if some are marked "test" and some "train" and there are any unmarked, we error. (Alternatively we make it less smart, and/or make this column boolean.)

jesteria avatar Jan 13 '21 21:01 jesteria