Michael Heinzinger
Michael Heinzinger
For per-residue embeddings I think that this will blow the disk-space for most people and if you only allow bulk download instead of accessing single/few sequences, downloading might in fact...
I've checked and what we had was only UniRef50 for SeqVec not ProtTrans. So I am afraid that you will have to compute those on your end. I would actually...
In fact, I also realized some slow-down when switching from my own [script](https://colab.research.google.com/drive/1TUj-ayG3WO52n5N50S7KH9vtt6zRkdmj?usp=sharing) to the [bio_embeddings implementation](https://github.com/sacdallago/bio_embeddings/blob/develop/bio_embeddings/embed/prottrans_t5_embedder.py#L123). I could imagine that this is due to bio_embeddings first moving per_residue embeddings...
Thanks for jumping in @johnnytam100 ! - In fact, our computing resources are currently also mostly eaten up by a few new projects which is why I would also have...
In case you want to generate embeddings on a large scale just a minor tip (maybe obvious, in this case simply ignore): - For convenience you can use the ProtT5...
Hi, you can download pre-computed ProtT5 embeddings for human here: https://zenodo.org/record/5047020 By now, there is also an integration of ProtT5 embeddings in UniProt which allows you to download pre-computed embeddings...
Perfectly correct: lines/line_indices in the CSV refer to the the entries in the H5 file. So you can use the line-index of an entry in the CSV to query the...
Thanks for the pointer - that is definitely extremely helpful! I just wondered whether there is sort of a README or brief summary of the technical details or the bells&whistles...
Hi; we (aka @t03i ) fixed the server issue on our end. Should work again now :)
Hey :) first of all: thanks for your feedback! On your issue: the problem is that the current embedder, ProtT5, is run in half-precision which also produces embeddings of this...