azure-tables-hadoop
azure-tables-hadoop copied to clipboard
Performance against large Table Storage collection
We are having trouble getting this library to perform against a Table Storage collection that has about 2 million records in it. Each record is approximately 4KB.
For example, a simple SELECT LIMIT 10 statement is timing out on a 7 node HDInsight cluster. Has anyone tried using this library yet, and if so, are you having similar results? Perhaps we are not using it properly.
Thanks and regards, Clayton
I think the performance issue relative to get all partition in the DefaultTablePartitioner.java. maybe you should rewrite the code depend on the partition logic of your own table.