Tobias Pitters
Tobias Pitters
Ok, with the merge of #2870 some things changed here. Sorry for this but things move quickly here. We need to implement this for the [`get_formatted` method of `DatasetEntry` class](https://github.com/LAION-AI/Open-Assistant/blob/main/model/model_training/custom_datasets/formatting.py#L84)....
I agree we should be capable to flag content.
That sounds interesting, especially since our models are not really good at coding, does anyone has the time to add this?
I would appreciate a test for this
Regardless of the terms of service this should not happen! We should probably add more safety datasets, especially anything that helps the model to respond well in chats about abuse.
This might be a possible duplicate: https://github.com/LAION-AI/Open-Assistant/issues/1927 Also note that tools like pdfplumber or textract can be used for this task
@olliestanley We encountered that our model believes it is created by openai because of the statements you mentioned. The linked PR removes this (from the 14 datasets mentioned in the...
Thanks for reaching out @richardliaw. Currently we are using deepspeed for training our models. Could elaborate a bit on the differences and to deepspeed and the advantages of ray or...
@djaym7 just checked in our weights&biases project and I did not find any runs with cerebras gpt. Is testing this model still something we consider @andreaskoepf ?
please raise your concerns in English and give a bit more background on them. Otherwise I'll just close them.