starcoder2 icon indicating copy to clipboard operation
starcoder2 copied to clipboard

Unlawful use of my code

Open adryzz opened this issue 1 year ago • 9 comments

The readme of this repo reads the following:

StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 [...]

The dataset linked contains my code, without following its license (or lack thereof).

Consent is not opt-out. You trained an LLM on code you are not allowed to use.

adryzz avatar Mar 22 '24 17:03 adryzz

Stop being a little cry baby

yamiteru avatar Jun 30 '24 11:06 yamiteru

womp womp you cant use my code without following its license

adryzz avatar Jun 30 '24 13:06 adryzz

Quote from https://policies.stackoverflow.co/company/trademark-guidance/:

"We decided early on that all user-generated content in the Stack Exchange Network would be given back to the community under a Creative Commons license."

Furthermore, see Point 6 of the ToS Section "Subscriber Content":

You agree that any and all content, including without limitation any and all text, graphics, logos, tools, photographs, images, illustrations, software or source code, audio and video, animations, and product feedback (collectively, “Content”) that you provide to the public Network (collectively, “Subscriber Content”), is perpetually and irrevocably licensed to Stack Overflow on a worldwide, royalty-free, non-exclusive basis pursuant to Creative Commons licensing terms (CC BY-SA 4.0), and you grant Stack Overflow the perpetual and irrevocable right and license to access, use, process, copy, distribute, export, display and to commercially exploit such Subscriber Content, even if such Subscriber Content has been contributed and subsequently removed by you as reasonably necessary [...] [...] This means that you cannot revoke permission for Stack Overflow to publish, distribute, store and use such content and to allow others to have derivative rights to publish, distribute, store and use such content. The CC BY-SA 4.0 license terms are explained in further detail by Creative Commons, and the license terms applicable to content are explained in further detail here. You should be aware that all Public Content you contribute is available for public copy and redistribution, and all such Public Content must have appropriate attribution.

Snowman-25 avatar Sep 13 '24 09:09 Snowman-25

this isnt stack overflow lmfao i dont care about their tos

adryzz avatar Sep 24 '24 16:09 adryzz

Your code was taken from Stack Overflow, where it was available under the license mentioned above. With posting it on Stack Overflow, you've forfeit all rights to the content you posted. Thus, inclusion in starcoder2 is lawful

Snowman-25 avatar Sep 24 '24 18:09 Snowman-25

i have never posted it on stack overflow, show me where exactly i have done so

adryzz avatar Sep 26 '24 13:09 adryzz

Where is the code you're referring to and why do you think it's in the dataset?

Snowman-25 avatar Sep 26 '24 19:09 Snowman-25

image image

this issue was created before the opt-out was even a thing, meaning the model was trained on code it wasn't allowed to use (and even then, consent is not opt-out). you can't "untrain" stuff.

https://github.com/bigcode-project/opt-out-v2/issues/1814

as this reply says:

Your opt-out request has been processed and your data was removed in version v2.1 of The Stack and all future versions.

this means that my code is in all previous versions of the dataset, and given that this repository was created before the opt-out (which wouldn't hold water anyway because consent is not opt-out) was even a thing. stop making this about stack overflow when it wasn't ever mentioned.

adryzz avatar Sep 27 '24 08:09 adryzz

Hi, was there any resolution to this?

joelberkeley avatar Aug 11 '25 20:08 joelberkeley