VESPA icon indicating copy to clipboard operation
VESPA copied to clipboard

Eff10k

Open onebluesky37 opened this issue 8 months ago • 3 comments

Thanks to your help, I have re-trained the CONSCNN model on the ConSurf-DB dataset using ESM embeddings. Could you also provide the Eff10k dataset so I can re-train the 10-fold logistic regression? Thanks a lot. Happiness to you.

onebluesky37 avatar Apr 28 '25 08:04 onebluesky37

Hi :) I currently do not have access to the dataset and it might take a while until i do. in the meantime I would suggest to work with the new version of vespa: https://academic.oup.com/bioinformatics/article/40/11/btae621/7907184 and the respective training data there https://zenodo.org/records/11085958

C-Marquet avatar Apr 29 '25 08:04 C-Marquet

Thank you for your response! Could you please share a link to the Eff10k dataset? I’d love to keep an eye on its accessibility over time. I’m also diving into the vespaG method—such an impressive piece of work! I have one question: is there a way to convert the vespaG training data into an Eff10k-style format with “effect” and “neutral” labels? Or perhaps you know of any other datasets, beyond Eff10k, already annotated with those two labels? I’ve been really impressed by vespa’s zero-shot performance and would love to reproduce it, and of course I’ll keep following vespaG as well. Thanks a lot. Wishing you a wonderful life!

onebluesky37 avatar Apr 29 '25 14:04 onebluesky37

could you please reach out via the email provided here so i can send you the data https://doi.org/10.1007/s00439-021-02411-y

C-Marquet avatar May 22 '25 08:05 C-Marquet