gpt-2-output-dataset icon indicating copy to clipboard operation
gpt-2-output-dataset copied to clipboard

Dataset of GPT-2 outputs for research in detection, biases, and more

Results 31 gpt-2-output-dataset issues
Sort by recently updated
recently updated
newest added

I get the following error after trying to run ``` pip install -r requirements.txt python -m detector.server detector-base.pt ``` Error: ``` RuntimeError: Error(s) in loading state_dict for RobertaForSequenceClassification: Missing key(s)...

Can I use this project for commercial purposes? Actually, I want to create a website that can classify the text whether it is based on GPT-2 model or written by...

_I modified the script to utilize data classes, JSON serialization, and the `tqdm` library, ensuring a seamless and informative data download process. It also offers options to specify data sizes,...

There is a assignment error in the train.py script where in the loss and logits are considered to be 'str' type after the assignment and hence have to be updated....

Error text includes: OpenAI error. That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error...

Tested with second sample of ChatGPT and the detection result is not same with server. The test result of https://openai-openai-detector.hf.space/ ![image](https://user-images.githubusercontent.com/54778084/211030406-85c0330f-a52b-45e3-8b4d-97a22e8c132d.png) Test result with `roberta-base` model on localhost ![image](https://user-images.githubusercontent.com/54778084/211030544-70d68d76-1676-483f-a482-e29d7682fecf.png) Test...

This PR: * makes sure the download script doesn't clobber any existing files if they seem correct enough (same size as remote) * ensures 404, etc. errors don't get written...

Some of the training data (specifically, the GPT2 generated datasets) contain texts of length 0. This causes training (and would cause inference) to error out. Is this expected? Please see...

I see that GPT2 is trained on webtext, but not sure how the datasets here are generated? Specifically what prompt was used with GPT2 to generate the "fake" datasets?