serenata-de-amor icon indicating copy to clipboard operation
serenata-de-amor copied to clipboard

Get names of immediate relatives of each deputy and senator

Open Irio opened this issue 8 years ago • 40 comments

Super useful for detecting different forms of nepotism. Better if we can get until third degree.

Irio avatar Aug 21 '16 02:08 Irio

@Irio, Where Can I get those information?

jvsl avatar Aug 28 '16 05:08 jvsl

Maybe Wikipedia, maybe Facebook… we have to be creative on this point…

cuducos avatar Aug 28 '16 09:08 cuducos

A good start is collecting who politicians declare on their own Facebook profiles.

Let's take Renato Molling, the RS federal deputy who spent more from the Quota for Exercising Parliamentary Activity last year. He has a personal Facebook profile and lists two people as his family members, Larissa Molling and Vinicius Molling. In this specific case, listed relatives links to no Facebook profile directly, but searching for their names bring them as top results.

I believe Facebook API has ways of searching for people by their names and also returns the list of family members from the profile.

P.S. Even if this method brings just a short list of politicians and relatives, we can give a high level of trust on this information, since it's auto declared.

@jvsl

Irio avatar Aug 28 '16 15:08 Irio

@Irio @cuducos , I was looking for a free service like this: https://www.myheritage.com.br/, but I didn't find. I think wikipedia and facebook are a good start as cuducos said.

jvsl avatar Aug 28 '16 16:08 jvsl

Kudos for that @irio:

P.S. Even if this method brings just a short list of politicians and relatives, we can give a high level of trust on this information, since it's auto declared.

cuducos avatar Aug 28 '16 18:08 cuducos

I've been trying to use graph API to get family from deputy and senator profiles, but I didn't have success. The API requires access token to get these information. The user token is obtained log in on facebook. Thus, I'm just allowed to get my own information from facebook. So, I'll try to use wikipedia.

jvsl avatar Sep 07 '16 04:09 jvsl

If someone wants to try using facebook, this link will help: https://developers.facebook.com/tools/explorer/

jvsl avatar Sep 07 '16 04:09 jvsl

I've been trying to use graph API to get family from deputy and senator profiles, but I didn't have success.

I'm sorry to hear that, @jvsl — we already knew that, and it was documented inthe README. I don't want to sound like a dick, or even like mommy saying I've told you, but is there a way to make it clearer so people start off looking for alternatives?

Also I want to apologize for being less clear than I could have been. When I said earlier in this topic maybe Facebook I was assuming that Graph API wouldn't work but we could try to get some Facebook data (for example, using PhantomJS, Selenium and reaching URLs like https://www.facebook.com/USERNAMEj/about?section=relationship).

And surely using Wikipedia is a good idea too! This message was just to say I'm sorry, and to say I'm happy for your support and enthusiasm ; )

cuducos avatar Sep 07 '16 15:09 cuducos

No worries. It was just a communication failure. I just think you all could have a better controll of "who is doing what". Some tool like a trello or something like that. It's just a suggestion. I have interest in keep doing that task but I don't know if someone already get it. Do you know what I mean? :)

jvsl avatar Sep 07 '16 20:09 jvsl

@jvsl We are following what we see in many open-source communities here on GitHub. A comment in an issue saying I got it is enough. People interested in the topic of the issue usually read the issue thread and can get an overview of what's going on on the design and execution ; )

cuducos avatar Sep 07 '16 20:09 cuducos

Ok, I got it. I imagined that.

jvsl avatar Sep 07 '16 20:09 jvsl

Maybe include second and third level relatives? Use a grade of proximity? ex: A son-in-law may not be a direct reference, but could be the daughter's husband.

janosimas avatar Sep 10 '16 01:09 janosimas

Here you can get name of mother and father ! Is the begin ... maybe top down to see brothers and sisters then you can start with this. http://www2.camara.leg.br/deputados/pesquisa/layouts_deputados_biografia?pk=73481

allantorres avatar Sep 13 '16 13:09 allantorres

And you can try make something in Wikipedia like this page , you have the informatino but it is not easy to catch : https://pt.wikipedia.org/wiki/Fernando_Marroni

allantorres avatar Sep 13 '16 13:09 allantorres

Yes, I got relatives from wikipedia and I've got good results. I'm about to finish the script. :)

jvsl avatar Sep 13 '16 13:09 jvsl

Will be great if we could see if there is any relatives working in other deputy or senator staff. We can see information about nepotism.

allantorres avatar Sep 14 '16 01:09 allantorres

@allantorres These 2 issues raise these ideas. https://github.com/datasciencebr/serenata-de-amor/issues/17 https://github.com/datasciencebr/serenata-de-amor/issues/18 Willing to help?

Irio avatar Sep 14 '16 19:09 Irio

Here are some other possible sources for this data.

Text mine from news that mention family members of politicians Most family members won't have been mentioned in any bit of news. However, among those that have, it is more likely to have been about some previous suspicion of corruption, and that makes it all the more important for them to be considered for analysis.

For an example of such information out in the open, open Google's news search, type in the name of a politician followed by the word "filha" (daughter) and, if she's been mentioned in the news, you're likely to find the name of the politician's daughter.

Check the dataset of campaign donors The Superior Electoral Court of Brazil releases data about the campaign donors of each candidate. Starting from the 2016 local election, only natural persons can contribute for campaigns. Close relatives are more likely to contribute more money to the politician's campaign. This correlation could be explored by cross-referenced with other sources of data to find which ones are relatives.

augusto-herrmann avatar Sep 23 '16 11:09 augusto-herrmann

Hey,

I don't know if I'm bit late but, an valid possibility is to make an "Relative Index" using Six degrees of separation theory.

So all parliamentarians would have a 0-factor, direct relatives and past campaign donators 1-factor, relative to 1-factor people get a 2-factor and so on.

This can be a efficient way to build a heat-map on potential scapegoats circling the parliamentary core.

For who aren't familiar with the concept the Kevin Bacon game is a good example of this concept in action.

wfzyx avatar Sep 29 '16 03:09 wfzyx

me and @braunmagrin will use congress person biography from câmara getting congress person filiation and will export to an csv in the following configuration: congressperson_id,relationship,relative_name

anaschwendler avatar Oct 18 '16 16:10 anaschwendler

Maybe my comment help with this

jonasporto avatar Oct 18 '16 22:10 jonasporto

Unfortunately, I have not had the time I would like to dedicate myself to the project. But I got to develop a script that gets relatives of senators through the wikipedia page on google and own wikipedia page. The script needs improvement and can be extended to get relatives of deputies as well. At least 30 senators have no family information in the wikipedia page. Thus, it was possible to generate a json with the information of the family of 51 senators.

Here's the link: https://github.com/jvsl/script-get-relatives-from-wikipedia

jvsl avatar Oct 19 '16 01:10 jvsl

the script needs improvements to remove logic repeated and apply some good practices.

jvsl avatar Oct 19 '16 01:10 jvsl

There's also the DBpedia project, which maps the Wikipedia infobox to an ontology and provides a queriable endpoint. This avoids some pitfalls of web scrapping and is based on a graph data model.

For instance, this query:

select ?property ?object
where {
 <http://dbpedia.org/resource/Tasso_Jereissati> ?property ?object .
}

returns all facts avaiable about Tasso Jereissati. A more specific query would be:

select distinct ?conjuge
where {
 <http://dbpedia.org/resource/Tasso_Jereissati> <http://dbpedia.org/property/conjuge> ?conjuge .
}

You can try them here: http://pt.dbpedia.org/sparql

To do this programatically in Ptyhon, you can try rdflib and SPARQLWrapper.

The drawback of this approach is that the endpoint is not live sync-ed with Wikipedia database. So it depends on periodic data dumps. I personally think that Linked Data technologies (RDF, SPARQL, etc) can be very helpful in the project.

talespaiva avatar Oct 19 '16 08:10 talespaiva

It is possible to retrieve all name pairs of politicians and their spouses from DBPedia directly by using SPARQL, like this:

select distinct ?nome_politico ?nome_conjuje where
{
    ?politico a dbpedia-owl:Politician .
    ?politico rdfs:label ?nome_politico .
    ?politico dbpprop:conjuge ?conjuje .
    ?conjuje dbpprop:nome ?nome_conjuje .
}

Unfortunately, the Portuguese DBPedia returns only 12 pairs for that query.

augusto-herrmann avatar Oct 19 '16 11:10 augusto-herrmann

Indeed, the DBpedia datasets are not complete. I managed to get a little more results with these two queries:

The first one in the portuguese DBPedia [http://pt.dbpedia.org/sparql]:

select distinct *
where {
 ?person dbpprop:título ?title .
 ?person dbpprop:conjuge ?spouse .
 FILTER regex(?title, "Senador|Deputad", "g")
}

and this one in the English DBPedia [http://dbpedia.org/sparql](only for senators):

select *
    where {
     ?person dct:subject dbc:Members_of_the_Federal_Senate .
     OPTIONAL { ?person dbp:spouse ?spouse . }
     OPTIONAL { ?person dbp:children ?children . }
    }

I don't have much time now, but it's a matter of exploring the ontologies to discover the relationships.

talespaiva avatar Oct 19 '16 17:10 talespaiva

@anaschwendler and I created the script in the PR #93. It'll get the names of the parents of the congresspeople. Unfortunately the information is not complete, only approx. 800 out of 1150 have it. Also, it's only their parent's names, I guess we still need to find a way to get more degrees of relationship.

braunmagrin avatar Oct 21 '16 05:10 braunmagrin

@braunmagrin remember that we're having troubles with mother and father's name? @turicas has sent me a link to help us in this task: https://genderize.io/ We decided that we'll keep all info as parent, but it could be useful in the future :)

anaschwendler avatar Oct 26 '16 18:10 anaschwendler

Hi! Maybe this could help: http://facebook4j.github.io/en/api-support.html But let's start collecting all profiles of the politics and later looking for theirs family connections.

We already have this facebook profile field?

lucasa avatar Nov 01 '16 18:11 lucasa

@lucasa have you tried it for real? As I mentioned earlier Facebook API just share user data among users that are using the same Facebook app (I mean, we'll have an app, an API key for this app and we'll only have access to users that sign in to our app).

cuducos avatar Nov 01 '16 20:11 cuducos