cobrapy
cobrapy copied to clipboard
Prepend underscores for tab completion of IDs that start with a number
Tab completion of reaction and metabolite IDs is convenient and widely used in IPython and Jupyter. Falling back on get_by_id
for IDs that do not match the Python variable-naming rules is a common annoyance (for me at least, and I imagine many other users).
IDs from existing databases (BiGG, KEGG, etc.) generally use a limited character set that is consistent with Python variable naming, so tab completion works. The main exception is identifiers that start with a number. These are common in BiGG – I'm not sure about other databases – and it would take quite a lot of work to move existing models to new BiGG IDs not starting with numbers. It would also be less intuitive because chemical names commonly begin with numbers ("10-Formyltetrahydrofolate", "1-propanol", etc.).
What if, as a compromise, COBRApy prepended IDs that start with numbers with an underscore, just for tab-completion (and just in the cases were IDs do not conflict with another ID in the model)?
So this would work:
iJO1366.metabolites._10ft<tab>
# completes to:
iJO1366.metabolites._10fthf
# which is equivalent to:
iJO1366.metabolites.get_by_id('10fthf')
Yes, I've been considering that as well. I think the big pro is usability in terms of fast interactive access to objects. The downsides I see, and they do not necessarily outweigh the big pro, are inconsistency between DictList.get_by_id
and DictList.<tab>
and further inconsistency between the identifier in the SBML and in the cobra.Model
.
Alternatively, we could prepend everything with M_
, R_
or G_
but suffers from the same problems.
What would you do if there is already another reaction with id _10fthf
?
In that case, the shortcut would point to the exact match:
iJO1366.metabolites._10ft<tab>
# completes to:
iJO1366.metabolites._10fthf
# which is equivalent to:
iJO1366.metabolites.get_by_id('_10fthf')
I image that's a rare case, and BiGG IDs cannot begin with underscore, but it would be a potential source of confusion.
The preference seems to be to implement along the lines of SBML identifiers.