llama icon indicating copy to clipboard operation
llama copied to clipboard

Democratise AI by allowing ALL individuals access to the model.

Open elephantpanda opened this issue 1 year ago • 20 comments

Facebook says it wants to "democratise AI", yet also it says only the elite institutions will be able to use this model.

So that excludes:

  • independent researchers
  • non aligned scientists
  • people from countries without big institutions

This does not seem very democratic. In fact, if Einstein or Isaac Newton were alive today, they would be excluded from these since Einstein worked in a patent office, and Newton did independent research outside of the Royal Academy.

In fact Zuckerberg himself would be excluded as he dropped out of University and hence was not aligned with a big institution.

If history is our guide it would say that is the individual non-aligned researchers who are most likely to make big breakthroughs.

The democratic thing to do would be to allow ALL individuals the right to download the model. Even for a small fee for download bandwidth costs.

It seems like Facebook might just want the institutions to come up with good ideas which it can't commercialise and then Facebook just takes the ideas for free.

What do you think?

elephantpanda avatar Feb 25 '23 09:02 elephantpanda

Strongly agree

hellojiaru avatar Feb 25 '23 13:02 hellojiaru

open?

leerelive avatar Feb 25 '23 13:02 leerelive

After what happened with their Galactica model, I think Meta is right to limit access. If responsible people from the AI community use this model, it won't be misused. Even Bing Chat was shut down for a while by people intentionally trying to make it hallucinate. If you are angry, be angry at the reporters who want to cause trouble to increase their readership.

MikeyBeez avatar Feb 25 '23 17:02 MikeyBeez

After what happened with their Galactica model, I think Meta is right to limit access. If responsible people from the AI community use this model, it won't be misused. Even Bing Chat was shut down for a while by people intentionally trying to make it hallucinate. If you are angry, be angry at the reporters who want to cause trouble to increase their readership.

This assumes that all people from the AI community are "responsible". I would say the real danger of AI is to keep it in the hands of the elites.

Anyway, I'm not angry - I'm just pointing out the mistake of calling this "democratising AI" while simultaneously restricting it to the elites. This is more like the ancient Greek idea of democracy than the modern one where everyone gets a vote.

elephantpanda avatar Feb 25 '23 18:02 elephantpanda

Not to mention that it’s trained on all of our data without explicit consent or reimbursement (assuming you use facebook/insta etc)

Nyx7s avatar Feb 25 '23 19:02 Nyx7s

Not to mention that it’s trained on all of our data without explicit consent or reimbursement (assuming you use facebook/insta etc)

this is not true. Llama was only trained with publicly available sources, to establish simple reproducibility.

bartman081523 avatar Feb 26 '23 08:02 bartman081523

Not to mention that it’s trained on all of our data without explicit consent or reimbursement (assuming you use facebook/insta etc)

this is not true. Llama was only trained with publicly available sources, to establish simple reproducibility.

I suppose it depends on their definition of publicly available. Everything on the internet is publicly available one way or another.

elephantpanda avatar Feb 26 '23 08:02 elephantpanda

My profile is ‘public’, at least, my posts are visible to non friends…

but yes… if our data is fed into it because its publicly available, then the LLM should be available to said public

Nyx7s avatar Feb 26 '23 08:02 Nyx7s

Not to mention that it’s trained on all of our data without explicit consent or reimbursement (assuming you use facebook/insta etc)

this is not true. Llama was only trained with publicly available sources, to establish simple reproducibility.

I suppose it depends on their definition of publicly available. Everything on the internet is publicly available one way or another.

They cited Project Guttenberg as a specific source, which is public domain material. They also mention that they used only open-source sources as the dataset. That they sourced facebook/ig threads is a far-fetched apprehension/accusation and afaik not mentioned in the paper. Its rather the opposite.

bartman081523 avatar Feb 26 '23 09:02 bartman081523

Not to mention that it’s trained on all of our data without explicit consent or reimbursement (assuming you use facebook/insta etc)

this is not true. Llama was only trained with publicly available sources, to establish simple reproducibility.

I suppose it depends on their definition of publicly available. Everything on the internet is publicly available one way or another.

They cited Project Guttenberg as a specific source, which is public domain material. They also mention that they used only open-source sources as the dataset. That they sourced facebook/ig threads is a far-fetched apprehension/accusation and afaik not mentioned in the paper. Its rather the opposite.

Its really not that far-fetched... this is the same company that fed user info to cambridge analytica under the table. Its also precisely how their algorithm works when suggesting content, everything you post and interact with is fed through it and connected to a profile associated with your account, it would be more surprising if that model wasn't contaminated with unethical data... is that algorithm Llama-13b? Who knows, but does it really matter?

Regardless of whether Facebook(Meta) says “We are using your data and actions on Facebook and across the internet and with your phone to train Artificial Intelligence by creating an Avatar. We reserve the right to use your Avatar indefinitely , even if you terminate services with us.” they aren't compensating people, and the elite are privatizing and monopolizing access to technology that will ultimately be used to exploit the underclasses. So I think ultimately the point is that this technology should be open-source, even for the simple fact that it's built on/privatizing open source data one way or the other

Nyx7s avatar Feb 26 '23 12:02 Nyx7s

They cited Project Guttenberg as a specific source, which is public domain material. They also mention that they used only open-source sources as the dataset. That they sourced facebook/ig threads is a far-fetched apprehension/accusation and afaik not mentioned in the paper. Its rather the opposite.

Its really not that far-fetched... this is the same company that fed user info to cambridge analytica under the table. Its also precisely how their algorithm works when suggesting content, everything you post and interact with is fed through it and connected to a profile associated with your account, it would be more surprising if that model wasn't contaminated with unethical data... is that algorithm Llama-13b? Who knows, but does it really matter?

Yes, it does matter in the aspect of the scientific principle of reproduceability. If the Dataset that was trained on and therefore the training could not be replicated, the model evaluation would also not be reproducible, which would render the whole paper tagline and the whole papers stance on open-source and reproduceability worthless. I doubt that the researchers would jeopardize that for a little more data. The gating seems to be inherited from the Galactica failure/"scandal", as far as I understand.

Regardless of whether Facebook(Meta) says “We are using your data and actions on Facebook and across the internet and with your phone to train Artificial Intelligence by creating an Avatar. We reserve the right to use your Avatar indefinitely , even if you terminate services with us. ” they aren't compensating people, and the elite are privatizing and monopolizing access to technology that will ultimately be used to exploit the underclasses.

Thats another topic

So I think ultimately the point is that this technology should be open-source, even for the simple fact that it's built on/privatizing open source data one way or the other

Unrelated to the other points, here I agree with you.

bartman081523 avatar Feb 26 '23 13:02 bartman081523

The way some people are already bad-mouthing this model and Meta seems like plenty of evidence that they should be careful about how they release it. These models, in the right hands, will move science and engineering along at a startling pace. They're too valuable to humanity to let attention-seeking nitwits cause trouble.

MikeyBeez avatar Feb 26 '23 14:02 MikeyBeez

I believe that if you don't release the models to the public, the errors in the training will be noticed much later. It was open access that made it possible to quickly find bugs in the Galactica models. So this model should also be made publicly available and improved with the help of the community.

Sumanai avatar Feb 26 '23 15:02 Sumanai

Looking at the model [code] itself, there's no new methods or techniques. The [breakthrough] claim of LLaMa is that it is "open source" -- which also turns out to be false because the model weights itself would have "democratize" the need for another researcher to spend a million dollar to train the model.

Having a non-commercial bespoke license isn't the same as an open source license.

victor-iyi avatar Feb 26 '23 17:02 victor-iyi

The way some people are already bad-mouthing this model and Meta seems like plenty of evidence that they should be careful about how they release it. These models, in the right hands, will move science and engineering along at a startling pace. They're too valuable to humanity to let attention-seeking nitwits cause trouble.

Thats just 'effective altruist' nonsense, this tech/silicon valley billionaires are not going to 'save humanity' its going to save money for corporations at best, and its still functionally useless for anything other than spam, art projects and hype for 'effective altruists' to maintain power. Thats why google lost over 1b dollars, Its an automated chatbot system disguised as an 'intelligent' system. It is not capable of engineering or developing anything, or creating classifiers independently beyond its scope

Keeping it private enables this false mystique and facilitates the abuse that it claims to be preventing by keeping it hidden from the public

Nyx7s avatar Feb 26 '23 18:02 Nyx7s

The way some people are already bad-mouthing this model and Meta seems like plenty of evidence that they should be careful about how they release it. These models, in the right hands, will move science and engineering along at a startling pace. They're too valuable to humanity to let attention-seeking nitwits cause trouble.

Ironically, you are one of the people who are demonstrating the open source Stable Diffusion on your YouTube channel. So presumably you think you are on of the "right hands" rather than one of the "attention-seeking" people? 😂heh. Just messing with you buddy.

elephantpanda avatar Feb 26 '23 19:02 elephantpanda

I do what I can to help people and move AI forward. I think it's important. The fact is, Meta has tried to give the world a model, Galatica, and they were attacked for it. I want them to give this model to the world, but they have good reason to believe it has to be done carefully. Hierarchical context transformer architecture may reduce hallucination and confabulation enough to make these models safe for the braindead. Or they'll need to hobble this model like Bing Chat. But what if Meta wants to see how this is used in the wild? That's their right. So maybe they need to be careful of to whom they release this. The FACT that they released Galactica to the world is enough evidence for me of their desire to share this technology.

MikeyBeez avatar Feb 26 '23 22:02 MikeyBeez

My point of view is this:

If you create a program which generates random letters, eventually it's going to arrange those letters into a "bad thing".

Does this make a random character generator inherently bad?

Should we ban dictionaries that contain swear words? Perhaps only let the elites have dictionaries? (Since only they can be trusted not to misuse them?)

What we should do is just allow the dictionaries and the random word generators, and just put a disclaimer on them to say "may contain bad words". Is anyone really offended by a computer saying a bad word? I don't think so. It's just newspapers making clickbait headlines.

elephantpanda avatar Feb 26 '23 22:02 elephantpanda

regardless of whether "people can/should download the model" calling something open when it is gated is... misleading at best.

To clarify further, here is a quick explanation by our friend:

why using "open" might be misleading

In the modern era, there is a language model that has been created using advanced technology. This model is called "open" because it is based on the idea that anyone who wants to use it can do so freely. However, the people who created this model have prefaced it with a survey that requires users to agree to certain terms and conditions before accessing the model.

This means that even though the language model is technically "open" and available to everyone, it is not really open in practice because people have to meet certain requirements in order to access it. In other words, it is not fully accessible to everyone.

Therefore, some people might argue that the language model is not truly open because it is not completely free and available to everyone. It is more like a gated community where only those who meet certain criteria can enter.

pszemraj avatar Feb 26 '23 23:02 pszemraj

To clarify further, here is a quick explanation by our friend:

why using "open" might be misleading

In the modern era, there is a language model that has been created using advanced technology. This model is called "open" because it is based on the idea that anyone who wants to use it can do so freely. However, the people who created this model have prefaced it with a survey that requires users to agree to certain terms and conditions before accessing the model.

This means that even though the language model is technically "open" and available to everyone, it is not really open in practice because people have to meet certain requirements in order to access it. In other words, it is not fully accessible to everyone.

Therefore, some people might argue that the language model is not truly open because it is not completely free and available to everyone. It is more like a gated community where only those who meet certain criteria can enter.

This is just exemplary of this entire companies (and their sycophants) ‘ethics’ really, they simultaneously claim they care about people and think those same people are too ‘braindead’ to participate. Its for self aggrandizement, control and profit. The purpose is to ensure that exclusive access remains in their corporate inner circle in relation to contractors and the public… the exact kind of people that should be no where near this tech, are painting themselves as our 'saviors' and 'the right hands'. You’ll find that its just colonialism in electronic/digital form

Nyx7s avatar Feb 27 '23 01:02 Nyx7s

Hi @pauldog, as Llama 2 has a more permissive license, closing this one for now. Feel free to reopen with additional feedback or open a new one as well.

albertodepaola avatar Sep 06 '23 16:09 albertodepaola