eyal-converge
eyal-converge
@mani-aiml a bit late for the party, facing the same Issue .. do yo remember what was the Issue ?
Hi @BaldPulse Facing same Issue launching sagemaker training jobs using _accelerate_ also getting `RuntimeError: SMDDP does not support: _allgather_base` Can you explain what you did ?
> @eyal-converge I presume you followed the instructions [here](https://huggingface.co/docs/accelerate/en/usage_guides/sagemaker) and used accelerate to **launch a Sagemaker training job**. The solution is instead of launching a job with accelerate, launch a...
@tlitfin-unsw thanks the initiative we faced similar issues internally and did a very similar change
Hi @alexr00 any updates ?
> Can you upload the rest of the log too please? The snippet doesn't really reveal why `/home/ec2-user/result/prof_res` was not created. Full logs for `colabfold_search --mmseqs mmseqs --gpu 1 input_sequences.fasta...
> What's the input? MMseqs2 detected nucleotide input, which doesn't work with colabfold. That's my input `cat input_sequences.fasta ` ``` >seq1_test MFEARLVQGSILKKVLEALKDLINEACWDISSSGVNLQSMDSSHVSLVQLTLRSEGFDTYRCDRNLAMGVNLTSMSKILKCAGNEDIITLRAEDNADTLALVFEAPNQEKVSDYEMKLMDLDVEQLGIPEQEYSCVVKMPSGEFARICRDLSHIGDAVVISCAKDGVKFSASGELGNGNIKLSQTSNVDKEEEAVTIEMNEPVQLTFALRYLNFFTKATPLSSTVTLSMSADVPLVVEYKIADMGHLKYYLAPKIEDEEGS ```
@milot-mirdita sorry for ping - Is my Input wrong ?
> Only the contents of `uniref30_2302_db` (8.2 GB) and `colabfold_envdb_202108` (36GB) need to be in VRAM. These are the cluster consensus sequences that are searched against on GPU. The rest...
Hi @julien-c - saw you commented on similar issues 🙏 Any chance you can share some light here ?