Subunit detection is returning wrong numbers
I just spotted this while rewriting the code for diamond. If the subunit nomenclature of the enzyme is denoted by letters, subunits I, V and X will be get wrong numbers in complex_detection.R, because they will be treated as roman numbers.
So I would expect A -> 1 B -> 2 C -> 3 ... I -> 9
but instead it does I -> 1
You can test this for example with either the single uniprot entrance: A0A7V5FFT7
Or you can just use seq/Bacteria/unrev/1.6.5.3.fasta from the repository.
Furthermore in the very same test case the extraction of the subunits can fail if one of the keywords for detecting subunits is preceded by a single capital letter. For example if the header of the faster looks like this: "UniRef50_U3TYP0 NADH dehydrogenase I chain F n=1 Tax=Plautia stali symbiont TaxID=891974 RepID=U3TYP0_9ENTR" the script will extract: "I chain" as the subunit, instead of the expected "chain F"
My current plan is to implement diamond only for the -p all option, where I will try to correct these errors, however, I thought I report it here, as I am not sure when and if I find the time to finish it.