InstaNovo icon indicating copy to clipboard operation
InstaNovo copied to clipboard

keyerror when running prediction mode

Open irleader opened this issue 9 months ago • 4 comments

I downloaded mus_musculus.ipc and instanovo.pt. Run the command "python -m instanovo.transformer.predict ms_ninespecies_benchmark/data/ninespecies_updated/mus_musculus.ipc instanovo.pt", the error message is as follows:

100%|██████████████████████████████████████████████████████████████████████████████| 408/408 [1:42:54<00:00, 15.13s/it] INFO:root:Time taken for ms_ninespecies_benchmark/data/ninespecies_updated/mus_musculus.ipc is 6209.5 seconds INFO:root:Average time per batch (bs=64): 15.2 seconds Traceback (most recent call last): File "C:\ProgramData\miniconda3\envs\instanovo\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\ProgramData\miniconda3\envs\instanovo\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\ProgramData\miniconda3\envs\instanovo\lib\site-packages\instanovo\transformer\predict.py", line 205, in main() File "C:\ProgramData\miniconda3\envs\instanovo\lib\site-packages\instanovo\transformer\predict.py", line 189, in main get_preds(data_path, model, config, denovo, output_path, knapsack_path) File "C:\ProgramData\miniconda3\envs\instanovo\lib\site-packages\instanovo\transformer\predict.py", line 149, in get_preds aa_prec, aa_recall, pep_recall, pep_prec = metrics.compute_precision_recall( File "C:\ProgramData\miniconda3\envs\instanovo\lib\site-packages\instanovo\utils\metrics.py", line 109, in compute_precision_recall n_match = self._novor_match(targ, pred) File "C:\ProgramData\miniconda3\envs\instanovo\lib\site-packages\instanovo\utils\metrics.py", line 191, in _novor_match mass_a: list[float] = [self.residues[x] for x in a] File "C:\ProgramData\miniconda3\envs\instanovo\lib\site-packages\instanovo\utils\metrics.py", line 191, in mass_a: list[float] = [self.residues[x] for x in a] KeyError: '.'

Also, I am unable to add the --config, it says "predict.py: error: unrecognized arguments: --config base.yaml". I am also wondering where to find the knapsack file generated, so I can move to a folder to be used next time without generating.

irleader avatar May 02 '24 10:05 irleader

I have this question too.Did you solve this problem?

xiaohu-shi avatar May 07 '24 02:05 xiaohu-shi

Hi, the mus_musculus.ipc on HuggingFace is currently not directly supported by this release of InstaNovo. The representation used is slightly different and hasn't been converted to our spec yet. (this dataset has . at the start and end of the sequence, uses square instead of standard brackets, and includes N-terminal modifications).

If you want to make it compatible right now, you would need to modify the .ipc file modified_sequence column using Polars to fit our schema, and add the N-terminal modifications to our residues in the model config.

These files will be natively supported in the next major release of InstaNovo.

As for the --config argument, you need to remove the .yaml as this is automatically appended to the filename. This will also be updated in future to be easier to use.

KevinEloff avatar May 07 '24 09:05 KevinEloff

Thank you very much for your reply, it is very helpful to me.

---- Replied Message ---- | From | Kevin @.> | | Date | 05/07/2024 17:59 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [instadeepai/InstaNovo] keyerror when running prediction mode (Issue #36) |

Hi, the mus_musculus.ipc on HuggingFace is currently not directly supported by this release of InstaNovo. The representation used is slightly different and hasn't been converted to our spec yet. (this dataset has . at the start and end of the sequence, uses square instead of standard brackets, and includes N-terminal modifications).

If you want to make it compatible right now, you would need to modify the .ipc file modified_sequence column using Polars to fit our schema, and add the N-terminal modifications to our residues in the model config.

These files will be natively supported in the next major release of InstaNovo.

As for the --config argument, you need to remove the .yaml as this is automatically appended to the filename. This will also be updated in future to be easier to use.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

xiaohu-shi avatar May 08 '24 01:05 xiaohu-shi

Thanks a lot for your prompt answer. Is it possible to tell when will the next major release be avaialble? I will be waiting for the next release if it is released soon.

irleader avatar May 08 '24 05:05 irleader