PowerGraph
PowerGraph copied to clipboard
Big vertex fixes
As reported here, I am interested in extending graphlab so that it can be used with ids that are larger than the standard. This pull request does not change the vertex_id_type, but makes some minor changes which are necessary in the code that make it possible for someone who is interested to use a larger vertex id space.
Description of the changes
Current version of graphlab is not prepared for larger vertex id types. The usage of larger vertex id types requires to configure the project to compile with C++11. Slight modifications have been done to to the following files:
graphlab/graph/graph_basic_types.hpp: The type for local vertex id should be decoupled from the larger type (currently they are enforced to be the same type). We do this by introducing a new intermediate type calledstandard_vertex_id_type. For regular users this is just an implementation detail.graphlab/engine/distributed_chandy_misra.hpp: The assessment of sequentialization keys misses a cast tounsigned char.graphlab/graph/distributed_graph.hppandgraphlab/graph/ingress/distributed_ingress_base.hpp:num_in_edgesandnum_out_edgesshould have typesize_t.graphlab/graph/graph_hash.hpp: three casts fromvertex_id_typetosize_tare missing.- Two variables have been introduced into CMakeLists.txt to control the usage of an extended vertex id type.
- Some casts have been added to several files in apps and tests.
To the best of my knowledge, those changes will not have any effect when the vertex_id_type is kept "standard" (that is, either uint32_t or uint64_t) either with C++11 or without it. Thus I push the changes to be accepted into the graphlab project.
Steps to use a different vertex_id_type (assuming that the changes in this pull request are accepted into the project)
There are two ways for doing it.
Simple way
-
Select one of the types included in
graphlab/util/multiprecision_vertex_id_types.hpp. They areboost::multiprecision::int128_t,boost::multiprecision::int256_t,boost::multiprecision::int512_tandboost::multiprecision::int1024_t -
Configure the project as follows:
./configure --c++11 -D EXTERNAL_VERTEX_ID_TYPE_INCLUDE="'<graphlab/util/multiprecision_vertex_id_types.hpp>'" -D EXTERNAL_VERTEX_ID_TYPE=boost::multiprecision::int128_t
Complex way. Implementing another type
Here follows a step by step guide on how to use other types as vertex_id_type.
-
Decide on the type that you would prefer for your vertex id. Constraints on the type are:
- It should be an arithmetic type implementing +, *, ^ and so on. Currently I have only tested with
boost::multiprecision::int128_t, and it fulfills all the needs in this sense. - It should be castable into common types such as
size_t,int,unsigned char, .... It our case this is also provided by the boost multiprecision library. - It should be graphlab serializable (for
boost::multiprecision::int128_t, I used out of place serialization, seegraphlab/util/multiprecision_vertex_id_types.hppfor examples). - There should exist a function
size_t hash_value(const my_large_id_type& x)in the same namespace wheremy_large_id_typeis defined (seegraphlab/util/multiprecision_vertex_id_types.hpp, for examples).
- It should be an arithmetic type implementing +, *, ^ and so on. Currently I have only tested with
-
Configure the project as follows:
./configure --c++11 -D EXTERNAL_VERTEX_ID_TYPE_INCLUDE="'[path to your include file here]'" -D EXTERNAL_VERTEX_ID_TYPE=[your vertex_id_type class here]
I have also modified all the code in the toolkits to work with larger vertex_ids. However, I would like to hear your opinion on the pull request before making it bigger.
Hi Jesus, Let us take a look (it will take a couple of days) and we will get back to you ASAP. We highly appreciate your work!!
http://www.graphlab.com Danny Bickson Co-Founder US phone: 206-691-8266 Israeli phone: 073-7312889 https://twitter.com/graphlabteam http://www.linkedin.com/company/graphlab https://www.facebook.com/graphlabinc http://www.youtube.com/user/GraphLabInc
On Tue, Apr 22, 2014 at 12:34 AM, Jesús Cerquides [email protected]:
I have also modified all the code in the toolkits to work with larger vertex_ids. However, I would like to hear your opinion on the pull request before making it bigger.
— Reply to this email directly or view it on GitHubhttps://github.com/graphlab-code/graphlab/pull/137#issuecomment-40979823 .
Thanks Danny, by now just take a look at the main idea on how to do the extension. I have identified some issues that still need to be fixed.
Making the whole of graphlab compatible with large vertex_ids is a big change that impacts different parts of the code. Although I have tried to minimize the impact, it is still a major change. As such, right now adding it to the graphlab master branch does not seem the right way to go. On the other hand, I think that it can be potentially very interesting to other people that, like myself, should read the graph from a database which already has their own ids. Thus, to me the best option right now is to create a branch where the work is committed.
This raises questions about the development process that you expect for the graphlab project. Do you plan to have a single branch? How would major changes such as this make it into the project? How do you decide which functionality should be on the next release? Does the project have any defined set of tests that can be automatically run? If so, how?
These are very relevant questions to make clear if your idea is to have people contributing to the code base. By the way, given the amount of movement going on on the code base right now. A simple model such as "Danny will decide" could be the most appropriate.
Sorry for wandering ;)
Hi Those are great questions and I think they will be way more visible in our user forum. We are setting up an internal discussion for deciding what is the best way to help you in what you need. Please repost at the forum and we promise to get back with some advice.
http://www.graphlab.com Danny Bickson Co-Founder US phone: 206-691-8266 Israeli phone: 073-7312889 https://twitter.com/graphlabteam http://www.linkedin.com/company/graphlab https://www.facebook.com/graphlabinc http://www.youtube.com/user/GraphLabInc
On Tue, Apr 22, 2014 at 2:18 PM, Jesús Cerquides [email protected]:
Making the whole of graphlab compatible with large vertex_ids is a big change that impacts different parts of the code. Although I have tried to minimize the impact, it is still a major change. As such, right now adding it to the graphlab master branch does not seem the right way to go. On the other hand, I think that it can be potentially very interesting to other people that, like myself, should read the graph from a database which already has their own ids. Thus, to me the best option right now is to create a branch where the work is committed.
This raises questions about the development process that you expect for the graphlab project. Do you plan to have a single branch? How would major changes such as this make it into the project? How do you decide which functionality should be on the next release? Does the project have any defined set of tests that can be automatically run? If so, how?
These are very relevant questions to make clear if your idea is to have people contributing to the code base. By the way, given the amount of movement going on on the code base right now. A simple model such as "Danny will decide" could be the most appropriate.
Sorry for wandering ;)
— Reply to this email directly or view it on GitHubhttps://github.com/graphlab-code/graphlab/pull/137#issuecomment-41028118 .