kraken
kraken copied to clipboard
Validate VGSL spec before loading training data
Add a method to lib.vgsl.TorchVGSLModel
to validate a (partial) spec, feeding a 'dummy' line/image into it depending on the input specification. This would allow aborting training before loading the training dataset when an invalid/unworkable spec is given to the KrakenTrainer
constructors.
Why should this require adding a method to lib.vgsl.TorchVGSLModel
? In the KrakenTrainer
constructors immediately after adding the first element to the dataset it is possible to feed the dataset to the network inside a try...except
and throw a specific error. That would require a couple of lines of code.
A separate validator is preferable as it would allow tools that use the API like escriptorium to validate a user-provided spec. In the KrakenTrainer
object we'd have to instantiate the model twice (once without the output layer to validate and once with it after having loaded the complete dataset so we can determine the codec alphabet). I'd like to avoid that as the constructors are already annoyingly large complex pieces of code and should really be slimmed down.