eyal-converge

Results 18 comments of eyal-converge

@mani-aiml a bit late for the party, facing the same Issue .. do yo remember what was the Issue ?

Hi @BaldPulse Facing same Issue launching sagemaker training jobs using _accelerate_ also getting `RuntimeError: SMDDP does not support: _allgather_base` Can you explain what you did ?

> @eyal-converge I presume you followed the instructions [here](https://huggingface.co/docs/accelerate/en/usage_guides/sagemaker) and used accelerate to **launch a Sagemaker training job**. The solution is instead of launching a job with accelerate, launch a...

@tlitfin-unsw thanks the initiative we faced similar issues internally and did a very similar change

> Can you upload the rest of the log too please? The snippet doesn't really reveal why `/home/ec2-user/result/prof_res` was not created. Full logs for `colabfold_search --mmseqs mmseqs --gpu 1 input_sequences.fasta...

> What's the input? MMseqs2 detected nucleotide input, which doesn't work with colabfold. That's my input `cat input_sequences.fasta ` ``` >seq1_test MFEARLVQGSILKKVLEALKDLINEACWDISSSGVNLQSMDSSHVSLVQLTLRSEGFDTYRCDRNLAMGVNLTSMSKILKCAGNEDIITLRAEDNADTLALVFEAPNQEKVSDYEMKLMDLDVEQLGIPEQEYSCVVKMPSGEFARICRDLSHIGDAVVISCAKDGVKFSASGELGNGNIKLSQTSNVDKEEEAVTIEMNEPVQLTFALRYLNFFTKATPLSSTVTLSMSADVPLVVEYKIADMGHLKYYLAPKIEDEEGS ```

@milot-mirdita sorry for ping - Is my Input wrong ?

> Only the contents of `uniref30_2302_db` (8.2 GB) and `colabfold_envdb_202108` (36GB) need to be in VRAM. These are the cluster consensus sequences that are searched against on GPU. The rest...

Hi @julien-c - saw you commented on similar issues 🙏 Any chance you can share some light here ?