Sean Owen

Results 245 comments of Sean Owen

I don't think you're hitting the model's limit, but running out of memory fitting 2 or 1 batches of less than that in memory

That doesn't sound right, yes. The 3B model was working OK for me on the 32GB V100s, though I didn't run to completion for testing. I didn't make more deepspeed...

If you're truncating then yeah this is the problem that would cause. If you go that route just throw out long input entirely

I haven't tried this, but I think you can just use compute_transition_scores for this in the transformers API, like at https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationMixin.compute_transition_scores.example

You've set device_map="auto". Look at how it has assigned the layers with base_model.hf_device_map. Did it assign to all GPUs? From your output, seems like it almost all loaded on the...

Yeah, it seems to be an HF issue: https://github.com/huggingface/datasets-server/issues/1137 I reported it at https://github.com/huggingface/datasets-server/issues/1139 If it doesn't resolve today we'll have to roll back to putting a copy here for...

See https://huggingface.co/blog/how-to-generate and look for repetition penalty. You don't need to use generate.py but you're welcome to start from that. This is just a matter of using transformers settings, not...

Same as https://github.com/databricks-demos/dbdemos/issues/28 I'm not quite sure what you're asking. You can extract the log probability of a response from a model and you can decide whether the model feels...

You can try that in the prompt, but it doesn't guarantee it will do that. You can get the log prob of the response and decide when the model isn't...

That's a lot of data for fine-tuning, maybe too much. After all, I think dolly saw about 10 epochs x 15k example for all of its fine tuning, which is...