vak
vak copied to clipboard
ENH: allow specifying which split to `predict` (ignore annotation format in that case)
@NickleDave edit:
main issue here is we can't run vak predict on a dataset .csv that has an annotation format specified, because that format will get passed in to the item_transform, causing an error
Running 'vak predict' with my annotated canary data I got a crush because too many arguments were passed to the function item_transform() in vocal_dataset.py line 86.
The line calls item = self.item_transform(spect, lbl_tb, spect_path) But, it calls PredictItemTransform.call(self, source, spect_path=None) which only accepts 2 arguments.
This could be a mixup with EvalItemTransform.call(self, source, annot, spect_path=None) that does get 3 arguments.
Alternatively, it could be that the logic in VocalDataset.getitem() is such that it shouldn't reach line 86 unless we're running 'vak eval' .. and it did get there because my data is already annotated (so line 75 in vocal_dataset.py is True).
There are several ways to fix. For example:
- Allow PredictItemTransform.call() to get another argument
- Change the condition in VocalDataset.getitem() line 75 to account for the difference between 'predict' and 'eval'
try re-creating the .csv but without an annot_format argument in the toml config file -- i.e. as if it was not annotated
this should cause annot_format to be none for all rows, and that will stop vak from creating and passing in lbl_tb inside VocalDataset
I think that will make it work for now -- we can think later about a better way to handle this
The simplest way to be able to run predict on any dataset csv would be to just remove the logic that checks for what split to use.
But then we lose the ability to specify a predict split, or at least it becomes meaningless.
What would we then specify when we just want to predict? Some other split (e.g. test) feels hack-ish.
The second simplest way would be to add a config option that says "predict on this split". Might be the solution
Changed title to reflect
The second simplest way would be to add a config option that says "predict on this split"
We'd still need to work around core issue that we will pass in the wrong things into the item transform. But the item transform probably needs refactoring anyways