django-mysql-shard
django-mysql-shard copied to clipboard
A Model for Django Models to inherit (and utilities) for abstracting horizontal sharding on MySQL. Follow the instructions and grow tables infinitely.
django-mysql-shard
A Model for Django Models to inherit (and utilities) for abstracting horizontal sharding on MySQL. Follow the instructions and grow tables infinitely.
This is essentially a port of Instagram's methods:
http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
And uses lots of inspiration from Disqus: https://github.com/disqus/sharding-example
Why not just do range based?
Because all new entries would just end up on one machine rather than balance. Range based is good for things that are read in a balanced way.
Why not just use a NoSQL database?
Relations in your code and/or in your database? I like having some relations in the database.
Building
On each MySQL machine, build the hash generator thing:
gcc -shared -fPIC -o now_msec.so now_msec.cc -I /usr/include/mysql
sudo cp -rfp now_msec.so /usr/lib/mysql/plugin/
Then, on each machine execute, change this line in create_shard_table.sql and replace 5 with each MySQL shard ID number. Just make sure each MySQL machine has a different number.
DECLARE shard_id int DEFAULT 5;
Then run the MySQL:
mysql -uroot -pasdfasdf dbname < create_shard_table.sql
Now if you run:
mysql> select next_sharded_id();
+--------------------+
| next_sharded_id() |
+--------------------+
| 247562734190477317 |
+--------------------+
1 row in set (0.00 sec)
import bistring
a = bitstring.BitArray(bin(247562734190477317))
#print part where shard_id exists
shard_id = a[41:49].int
And there's your shard_id. I hope you can imagine what we're going to do with that! It means that any object we get will tell us where it lives without a lookup.
What we'll go over next is how to integrate this all into django -- Inserts will put the file into its proper shard. Selects will map-reduce to get all of the matching queries in all of the shards.