8values.github.io icon indicating copy to clipboard operation
8values.github.io copied to clipboard

Dataset availability

Open gus-morales opened this issue 8 years ago • 35 comments
trafficstars

It would be nice to have a handle on the dataset, and see how each of the results compare along where you live, study, education level, etc. A friend and I would gladly submit a code with statistical analysis to work on something along these lines. So, is it feasible to add an option to download the latest dataset?

gus-morales avatar Apr 26 '17 15:04 gus-morales

I strongly agree that there could be an interesting opportunity to collect important data here. And it shouldn't be so difficult, after the poll, you can ask politely to the user if want to share their: Age, Country, Gender (maybe), educational status, etc... (actually, maybe you can find something interesting with google analytics data)

ommirandap avatar Apr 26 '17 16:04 ommirandap

While I agree this would be a great thing to have, especially to help balance the test, I'm not currently collecting data.

TristanBomb avatar Apr 28 '17 06:04 TristanBomb

One solution would be to pull the server logs to see what final results people end up with, since they're included in the URL. Should be possible to post a text file of all the result URLs called and let people extract individual parameters for analysis.

Of course you'd have to accept some level of error from people like me who adjusted results until I got Nazism to scare my friends...

Patrick330 avatar May 04 '17 15:05 Patrick330

Ehm, what about those who do not want to have their data stored? Maybe a better way would be to ask at the end if they want to contribute their data to a common public dataset and in that case data gets submitted to the server. Opt-in.

mitar avatar May 04 '17 15:05 mitar

In that case I vote for an opt-out approach.

gus-morales avatar May 04 '17 15:05 gus-morales

I really do think it should be opt-in. This is to empower people to make a decision. We have already enough tracing on the web and guessing of my preferences.

mitar avatar May 04 '17 15:05 mitar

I agree with your attitude towards tracing, but I assume the dev is not looking to sell that data, or to participate in marketing. These would be used for simple statistics, which in turn would go back to the general public.

gus-morales avatar May 04 '17 15:05 gus-morales

Still. Even if you do an academic research on human subjects, that has to be opt-in with full consent f a person. Otherwise you cannot publish a paper. If you really want to make this dataset full useful, we should follow similar guidelines.

mitar avatar May 04 '17 16:05 mitar

I think you misunderstand. When you visit websites your URL requests and IP address are logged by the webserver. You can use VPNs to anonymize your IP, but you cannot avoid having your requests logged because the server must know what pages to send you.

Patrick330 avatar May 04 '17 16:05 Patrick330

Also, there are tons of useful datasets that do not follow similar guidelines. Similarly, the opposite is also true. Tracing has more to do with ethics, not with how useful the dataset is. And website cookies do this anyway already.

gus-morales avatar May 04 '17 16:05 gus-morales

Not necessary, if it is single page app where client-side changes the URL. This seems to be the case of this site at the moment. Furthermore, it is hosted on GitHub, so any URL requests logs are on GitHub.

mitar avatar May 04 '17 16:05 mitar

Yes, that's true.

Patrick330 avatar May 04 '17 16:05 Patrick330

So we would on purpose want to change this so that users would be more traceable? Why would that be a good thing. On the other hand, we could make it opt-in, when JavaScript submits the results to a server for those who do want to participate in the research. I have done similar things in the past and many users at the end do like to contribute and help, especially if you tell them that they can also get the data. I think this is much better proposition.

mitar avatar May 04 '17 17:05 mitar

Because to me (the guy who originally proposed to work on this dataset), this is not about tracing people. My code would not trace anyone nor I would sell the data to third parties.

Now, your second argument seems more appropriate to the discussion. I was suggesting an opt-out approach which I assume would yield more data (e.g. Dan Ariely's TED talk on decision making), but of course I am open to personal experiences on that matter.

gus-morales avatar May 04 '17 17:05 gus-morales

Yes, opt-out would probably get more data, but I am saying that in this case opt-in will probably also give you enough data, but in a more ethical way. Just because you can do something it does not mean you should be doing something.

mitar avatar May 04 '17 17:05 mitar

There is nothing unethical about collecting anonymous data provided to a website in a form request. On Thu, May 4, 2017 at 1:24 PM Mitar [email protected] wrote:

Yes, opt-out would probably get more data, but I am saying that in this case opt-in will probably also give you enough data, but in a more ethical way. Just because you can do something it does not mean you should be doing something.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/8values/8values.github.io/issues/29#issuecomment-299252834, or mute the thread https://github.com/notifications/unsubscribe-auth/AbFUF1Ez2PtAaz-yQ4zwCpfD6SuOnuPPks5r2gm-gaJpZM4NJEud .

Patrick330 avatar May 04 '17 17:05 Patrick330

If you inform users at the beginning that you will be doing so, true.

mitar avatar May 04 '17 17:05 mitar

Exactly. Also, this has becoming lecturing and I prefer to stay out of it. All I am saying is that collecting data is not necessarily unethical, specially if you tell them before hand. Yes, that could lead to tracing and that I agree, might be unethical, but hardly the point of this post.

gus-morales avatar May 04 '17 17:05 gus-morales

Collecting data is not, on its own, unethical. In fact, as I pointed out above, it's completely inevitable. Every server you ping keeps a log. It's very silly to claim it's unethical to track which buttons your anonymous users click on. That's like saying it's unethical to use web counters on your geocities page in 1996. On Thu, May 4, 2017 at 1:32 PM gus-morales [email protected] wrote:

Exactly. Also, this has becoming lecturing and I prefer to stay out of it. All I am saying is that collecting data is not necessarily unethical, specially if you tell them before hand. Yes, that could lead to tracing and that I agree, might be unethical, but hardly the point of this post.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/8values/8values.github.io/issues/29#issuecomment-299255208, or mute the thread https://github.com/notifications/unsubscribe-auth/AbFUF00w8IpX0qqtyYAO1Y1rFmd19luhks5r2guxgaJpZM4NJEud .

Patrick330 avatar May 04 '17 17:05 Patrick330

That is why websites have privacy policies where they explain what they are collecting. Even if they are collecting just HTTP logs. And no, you do not have to be collecting HTTP logs. You might decide to collect them, maybe you did it automatically, because everyone else is collecting them. But it is still a decision you made. And no, not all websites are collecting them, especially for the privacy reasons.

And sites which collect physiological data like this website should keep such decisions in mind. It really depends on what you are collecting. Here, data being collected is highly personal data, it is not just "clicking buttons".

mitar avatar May 04 '17 17:05 mitar

A web counter does not reveal your personality. This site here tries to do exactly that. You do not see the difference?

mitar avatar May 04 '17 18:05 mitar

Well yes, a privacy policy I agree is totally on point here. That and a notification of data collection. But that alone does not make the act of collecting anonymous data alone "less ethical"; it's just an act of transparency.

gus-morales avatar May 04 '17 18:05 gus-morales

Also, as long as it is anonymous data I don't see it as "highly personal", since you cannot relate this information to any person (like tracing does). The fact that the algorithm is trying to predict personality traits does not make it "more personal".

gus-morales avatar May 04 '17 18:05 gus-morales

Yes, this is one approach. The other is to give them an option (opt-in) to do a consensual decision to be part of this. You said yourself that you know that many people will probably just leave the default option. And why is that, why is not opt-in and opt-out the same? Because of those people who do not read the privacy policy, who do not think about potential implications. That is why it is important to make good defaults.

And no. Opt-in and opt-out do not have same ethical value. Opt-in is more ethical. Because you are not assuming that people will not have an issue collecting highly personal data.

mitar avatar May 04 '17 18:05 mitar

Also, as long as it is anonymous data I don't see it as "highly personal", since you cannot relate this information to any person (like tracing does).

By itself not. But the site also calls to statcounter.com. So then combining statcounter.com calls with some other calls to statcounter.com where you maybe reveal your identity also deanonymize those users.

mitar avatar May 04 '17 18:05 mitar

As long as they can read (which I am assuming they can), an opt-out approach with a notification is consensual.

And if you want to keep discussing this please open another thread. Both approaches are fairly clear by now, and I greatly appreciate your feedback.

gus-morales avatar May 04 '17 18:05 gus-morales

I think the whole academic world who is dealing with human subjects would disagree with you here. Just Google.

For example: http://www.ethicsguidebook.ac.uk/Opt-in-and-opt-out-sampling-94

This approach is seen as problematic by many ethics committees because it undermines the principle that consent should be freely given.

mitar avatar May 04 '17 18:05 mitar

" ‘Opt-out’ samples are those where participants are contacted without volunteering to take part in the research..." If you don't see the difference between that definition and what I have suggested, I have not much to say.

gus-morales avatar May 04 '17 18:05 gus-morales

I think you should read a bit more. What I think you are saying (can you point me to exactly where are you describing the protocol the participants would follow) is a pretty standard definition of opt-out. Everyone doing the quiz will have their data collected unless they say that they do not want that. That is opt-out.

mitar avatar May 04 '17 18:05 mitar

Yes but the "without volunteering" part is key to understand my point of view, assuming there is a public policy and a notification of collection.

And honestly we are splitting hairs now. The idea I originally suggested would have zero impact on the current status of privacy on the Internet, in either direction.

gus-morales avatar May 04 '17 18:05 gus-morales