enformer-pytorch icon indicating copy to clipboard operation
enformer-pytorch copied to clipboard

Training process

Open maksimallist opened this issue 1 year ago • 23 comments

Hello. Can you share the details of neural model training? Did you train it yourself? Did you collect data for training from basenji dataset files? I am unable to reproduce the claimed results during training.

maksimallist avatar Jul 10 '23 14:07 maksimallist

Has anyone looked into this yet? I am also interested in this, since training enformer from scratch using your implementation doesn't reproduce same Pearson correlation values (max I am getting is ~0.4).

fransilvionGenomica avatar Oct 23 '23 22:10 fransilvionGenomica

@fransilvionGenomica @maksimallist i tried a while ago using TPUs (didn't have access to large cluster of GPU at the time) and didn't hit the mark (got around 0.5-0.6). this was before Ziga officially released their model over at deepmind

the training script i used is all open sourced here . the original reason for making the repo was for a contracting project for a local startup

lucidrains avatar Nov 01 '23 18:11 lucidrains

@fransilvionGenomica are you planning on training it on proprietary data with your own GPU cluster?

lucidrains avatar Nov 01 '23 18:11 lucidrains

@lucidrains I am training your pytorch implementation using a single A100 GPU node with the original basenji dataset and gradient accumulation. I was using the following deepmind notebook as the reference: https://github.com/google-deepmind/deepmind-research/blob/master/enformer/enformer-training.ipynb. I do believe that it is possible to train the model on GPUs, since in the recent Borzoi paper from Enformer co-authors they did not use TPUs (https://www.biorxiv.org/content/10.1101/2023.08.30.555582v1). Unfortunately, they don't provide any training script (https://github.com/calico/borzoi).

fransilvionGenomica avatar Nov 01 '23 18:11 fransilvionGenomica

@fransilvionGenomica ahh, i have not checked out Borzoi yet, although someone else told me it is the successor to Enformer

why are you still using this repository if Borzoi is the new SOTA? without reading the paper, did Borzoi set a new SOTA?

lucidrains avatar Nov 01 '23 18:11 lucidrains

@fransilvionGenomica where do you work btw?

lucidrains avatar Nov 01 '23 18:11 lucidrains

Oh I see. It makes sense. Even Borzoi mentioned it took them ~25 days on 2 GPUs. And I am training on a single GPU. I guess, I will just have to wait then. Thanks!

fransilvionGenomica avatar Nov 01 '23 18:11 fransilvionGenomica

@fransilvionGenomica that is strange they waited that long. i thought calico had google level resources

lucidrains avatar Nov 01 '23 19:11 lucidrains

@fransilvionGenomica i'll revisit genomics maybe end of the month and read the Borzoi paper in detail. knee deep in other projects at the moment.

lucidrains avatar Nov 01 '23 19:11 lucidrains

ahh ok, was told that Borzoi is nothing more than Enformer applied to RNA-seq data. ok then using this repository is fine in that case

lucidrains avatar Nov 01 '23 20:11 lucidrains

Yes, architecture wise they are very similar. Borzoi is actually less complex.

fransilvionGenomica avatar Nov 01 '23 20:11 fransilvionGenomica

@fransilvionGenomica ok, i'll just copy / paste the existing code and remove that complexity for Borzoi later this month after i read the paper. hopefully they got rid of the annoying gamma positions

lucidrains avatar Nov 01 '23 20:11 lucidrains

Just curious, have you noticed anything about the batch size while training enformer from scratch? Like, does it have to be relatively big (like at least 32) or can you train decently even if batch size is 1 or 2?

fransilvionGenomica avatar Nov 01 '23 20:11 fransilvionGenomica

@fransilvionGenomica it has to be big (32 or 64). managing the data and long sequences was also a huge pain

lucidrains avatar Nov 01 '23 21:11 lucidrains

@fransilvionGenomica the code in this repository isn't even setup for distributed training. i didn't set up synchronized batchnorm, which is required for it to train well.

lucidrains avatar Nov 01 '23 21:11 lucidrains

@fransilvionGenomica actually let me just throw that in there for now

lucidrains avatar Nov 01 '23 21:11 lucidrains

Have you tried to run your enformer implementation with pytorch lightning?

fransilvionGenomica avatar Nov 01 '23 21:11 fransilvionGenomica

@fransilvionGenomica no i haven't, as i said above, my training was done in tensorflow sonnet with TPUs, as i had access to a large cluster of TPUs in collaboration with EleutherAI back then

lucidrains avatar Nov 01 '23 21:11 lucidrains

@fransilvionGenomica if you ever wire up a working training script, always welcome a pull request, in the spirit of open source science.

lucidrains avatar Nov 01 '23 21:11 lucidrains

@fransilvionGenomica ahh, i have not checked out Borzoi yet, although someone else told me it is the successor to Enformer

why are you still using this repository if Borzoi is the new SOTA? without reading the paper, did Borzoi set a new SOTA?

What the paper says: "Performance is difficult to compare directly to Enformer due to differences in data processing. Nevertheless, test accuracies on the overlapping datasets are broadly similar, indicating competitive model training" (https://www.biorxiv.org/content/10.1101/2023.08.30.555582v1.full)

Let's wait until reviewers ask for this question =)

minjaf avatar Nov 02 '23 03:11 minjaf

@lucidrains do you have training/validation loss trends left by any chance? for your tensorflow training code I mean.

fransilvionGenomica avatar Nov 20 '23 18:11 fransilvionGenomica

@fransilvionGenomica hey yes, actually still have it lying around (thanks wandb) https://api.wandb.ai/links/lucidrains/9ac4x106

lucidrains avatar Nov 20 '23 18:11 lucidrains