eyal-converge comments

Results 18 comments of


                                            eyal-converge

getting error on Distributed Data Parallel training on multiple ml.p4.24xlarge instances

@mani-aiml a bit late for the party, facing the same Issue .. do yo remember what was the Issue ?

Using accelerate launch to initialize sagemaker job doesn't work properly with multiple GPUs

Hi @BaldPulse Facing same Issue launching sagemaker training jobs using _accelerate_ also getting `RuntimeError: SMDDP does not support: _allgather_base` Can you explain what you did ?

Using accelerate launch to initialize sagemaker job doesn't work properly with multiple GPUs

> @eyal-converge I presume you followed the instructions [here](https://huggingface.co/docs/accelerate/en/usage_guides/sagemaker) and used accelerate to **launch a Sagemaker training job**. The solution is instead of launching a job with accelerate, launch a...

Add command-line chunking parameters for flexible VRAM utilization

@tlitfin-unsw thanks the initiative we faced similar issues internally and did a very similar change

Allow commenting on lines not changed in a PR

Hi @alexr00 any updates ?

`colabfold_search` fails GPU search

> Can you upload the rest of the log too please? The snippet doesn't really reveal why `/home/ec2-user/result/prof_res` was not created. Full logs for `colabfold_search --mmseqs mmseqs --gpu 1 input_sequences.fasta...

`colabfold_search` fails GPU search

> What's the input? MMseqs2 detected nucleotide input, which doesn't work with colabfold. That's my input `cat input_sequences.fasta ` ``` >seq1_test MFEARLVQGSILKKVLEALKDLINEACWDISSSGVNLQSMDSSHVSLVQLTLRSEGFDTYRCDRNLAMGVNLTSMSKILKCAGNEDIITLRAEDNADTLALVFEAPNQEKVSDYEMKLMDLDVEQLGIPEQEYSCVVKMPSGEFARICRDLSHIGDAVVISCAKDGVKFSASGELGNGNIKLSQTSNVDKEEEAVTIEMNEPVQLTFALRYLNFFTKATPLSSTVTLSMSADVPLVVEYKIADMGHLKYYLAPKIEDEEGS ```

eyal-converge

getting error on Distributed Data Parallel training on multiple ml.p4.24xlarge instances

Using accelerate launch to initialize sagemaker job doesn't work properly with multiple GPUs

Using accelerate launch to initialize sagemaker job doesn't work properly with multiple GPUs

Add command-line chunking parameters for flexible VRAM utilization

Allow commenting on lines not changed in a PR

`colabfold_search` fails GPU search

`colabfold_search` fails GPU search

`colabfold_search` fails GPU search

Required DB's for GPU Server setup

Production system breaks due to HTTP 429 errors