phalexo comments

Results 137 comments of


                                            phalexo

Could it load and tune falcon-40B ?

I loaded falcon-40b-instruct into 4 GPUs, 12.3GiB each. It runs extremely slowly. I have seen others reporting loading it into a single GPU with 48GiB VRAM and it was still...

Could it load and tune falcon-40B ?

> Yes. I tested it with 8* A100, it occupates 12G on each GPU and still not fast, especially alter the input length more than 500. If falcon-40b produces high...

ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported.

check your config.json (that comes with the model weights) and see if the name is misspelled. It happens often with mixed case names.

[Not an issue] - finetune falcon-40b with Qlora

I am curious what GPU set up was used for Falcon. I have tried to run fine-tuning on Llama-65B and Gaunaco-65B on 4 12.288GiB GPUs. I get various errors, the...

[Not an issue] - finetune falcon-40b with Qlora

> I also found in my experiments that Falcon 40B was not as good as LLaMA 65B on MMLU. One thing to note on the experiments above is whether they...

Out of memory error on model that previously worked fine after update to version 0.1.13

Is there a download for the older version somewhere? I'd like to try it.

Out of memory error on model that previously worked fine after update to version 0.1.13

> How much ram does your machine have? You mentioned vram. I know you are asking the original poster, but I have 330GiB on the host and 12.2GiB per GPU...

Out of memory error on model that previously worked fine after update to version 0.1.13

Fantastic. Before dropping to 0.1.11 it was printing junk, and dying on the second inquiry. Now it seems to work. Quickly too.

Out of memory error on model that previously worked fine after update to version 0.1.13

> @madsamjp did you try with 0.1.14 that is out now? I have tried it with 0.1.14 as modified for Mixtral, the error is back from the dead. So, if...

Out of memory error on model that previously worked fine after update to version 0.1.13

@technovangelist Has anyone discovered anything new on this? Perhaps in other threads? I have tried to use the Mixtral branch, derived from 0.1.14 (I assume), the error is still there....