vak icon indicating copy to clipboard operation
vak copied to clipboard

ENH: allow specifying which split to `predict` (ignore annotation format in that case)

Open yardencsGitHub opened this issue 5 years ago • 3 comments

@NickleDave edit: main issue here is we can't run vak predict on a dataset .csv that has an annotation format specified, because that format will get passed in to the item_transform, causing an error

Running 'vak predict' with my annotated canary data I got a crush because too many arguments were passed to the function item_transform() in vocal_dataset.py line 86.

The line calls item = self.item_transform(spect, lbl_tb, spect_path) But, it calls PredictItemTransform.call(self, source, spect_path=None) which only accepts 2 arguments.

This could be a mixup with EvalItemTransform.call(self, source, annot, spect_path=None) that does get 3 arguments.

Alternatively, it could be that the logic in VocalDataset.getitem() is such that it shouldn't reach line 86 unless we're running 'vak eval' .. and it did get there because my data is already annotated (so line 75 in vocal_dataset.py is True).

There are several ways to fix. For example:

  1. Allow PredictItemTransform.call() to get another argument
  2. Change the condition in VocalDataset.getitem() line 75 to account for the difference between 'predict' and 'eval'

yardencsGitHub avatar Aug 22 '20 19:08 yardencsGitHub

try re-creating the .csv but without an annot_format argument in the toml config file -- i.e. as if it was not annotated

this should cause annot_format to be none for all rows, and that will stop vak from creating and passing in lbl_tb inside VocalDataset

I think that will make it work for now -- we can think later about a better way to handle this

NickleDave avatar Aug 22 '20 22:08 NickleDave

The simplest way to be able to run predict on any dataset csv would be to just remove the logic that checks for what split to use.

But then we lose the ability to specify a predict split, or at least it becomes meaningless.
What would we then specify when we just want to predict? Some other split (e.g. test) feels hack-ish.

The second simplest way would be to add a config option that says "predict on this split". Might be the solution

NickleDave avatar Aug 14 '22 02:08 NickleDave

Changed title to reflect

The second simplest way would be to add a config option that says "predict on this split"

We'd still need to work around core issue that we will pass in the wrong things into the item transform. But the item transform probably needs refactoring anyways

NickleDave avatar Aug 17 '22 13:08 NickleDave