Sean Owen comments

Results 245 comments of


                                            Sean Owen

CUDA out of memory. Can this be run on a p3.16xlarge?

I don't think you're hitting the model's limit, but running out of memory fitting 2 or 1 batches of less than that in memory

CUDA out of memory. Can this be run on a p3.16xlarge?

That doesn't sound right, yes. The 3B model was working OK for me on the 32GB V100s, though I didn't run to completion for testing. I didn't make more deepspeed...

CUDA out of memory. Can this be run on a p3.16xlarge?

If you're truncating then yeah this is the problem that would cause. If you go that route just throw out long input entirely

[Feature] Logprobs

I haven't tried this, but I think you can just use compute_transition_scores for this in the transformers API, like at https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationMixin.compute_transition_scores.example

Only one GPU is working hard, while the other GPUs are idle

You've set device_map="auto". Look at how it has assigned the layers with base_model.hf_device_map. Did it assign to all GPUs? From your output, seems like it almost all loaded on the...

could not download data

Yeah, it seems to be an HF issue: https://github.com/huggingface/datasets-server/issues/1137 I reported it at https://github.com/huggingface/datasets-server/issues/1139 If it doesn't resolve today we'll have to roll back to putting a copy here for...

repetition sentences/phrases in response

See https://huggingface.co/blog/how-to-generate and look for repetition penalty. You don't need to use generate.py but you're welcome to start from that. This is just a matter of using transformers settings, not...

Dolly keeps generating answers despite the instructions

Same as https://github.com/databricks-demos/dbdemos/issues/28 I'm not quite sure what you're asking. You can extract the log probability of a response from a model and you can decide whether the model feels...

Dolly keeps generating answers despite the instructions

You can try that in the prompt, but it doesn't guarantee it will do that. You can get the log prob of the response and decide when the model isn't...

Training on A100

That's a lot of data for fine-tuning, maybe too much. After all, I think dolly saw about 10 epochs x 15k example for all of its fine tuning, which is...