featurebase icon indicating copy to clipboard operation
featurebase copied to clipboard

Add pilosa shard-distribution subcommand

Open travisturner opened this issue 6 years ago • 2 comments

Overview

This PR adds a subcommand shard-distribution to Pilosa that will return the shard placement based on various cluster parameters (index name, number of nodes, replication, max shard).

If a --host is provided, it will base its calculations off of an existing pilosa cluster. If not, it will start an in-memory cluster to determine the shard distribution.

This PR also adds a new API method (and corresponding client methods), so I would like some feedback on the decisions made with those.

Pull request checklist

  • [x] I have read the contributing guide.
  • [x] I have agreed to the Contributor License Agreement.
  • [ ] I have updated the documentation.
  • [x] I have resolved any merge conflicts.
  • [x] I have included tests that cover my changes.
  • [x] All new and existing tests pass.
  • [ ] Make sure PR title conforms to convention in CHANGELOG.md.
  • [ ] Add appropriate changelog label to PR (if applicable).

Code review checklist

This is the checklist that the reviewer will follow while reviewing your pull request. You do not need to do anything with this checklist, but be aware of what the reviewer will be looking for.

  • [ ] Ensure that any changes to external docs have been included in this pull request.
  • [ ] If the changes require that minor/major versions need to be updated, tag the PR appropriately.
  • [ ] Ensure the new code is properly commented and follows Idiomatic Go.
  • [ ] Check that tests have been written and that they cover the new functionality.
  • [ ] Run tests and ensure they pass.
  • [ ] Build and run the code, performing any applicable integration testing.
  • [ ] Make sure PR title conforms to convention in CHANGELOG.md.
  • [ ] Make sure PR is tagged with appropriate changelog label.

travisturner avatar Jun 08 '19 03:06 travisturner

I've tested the PR on a 2 node cluster using custom built Pilosa (defaults) with the following:

node1.toml:

data-dir = "/home/yuce/ramdisk/cr1"
bind = "127.0.0.1:10101"

[gossip]
port = 14000

[cluster]
replicas = 2
coordinator = true

[metric]
diagnostics = false

node2.toml:

data-dir = "/home/yuce/ramdisk/cr2"                                                                                                                                                                                                           
bind = "127.0.0.1:10102"

[gossip]
port = 15000
seeds = "127.0.0.1:14000"

[cluster]
replicas = 2

[metric]
diagnostics = false

Created the schema:

$ curl localhost:10101/index/i1 -d ''
$ curl localhost:10101/index/i1/field/f1 -d ''

Then:

$ ./pilosa shard-distribution --host :10101 --index i1
[[0],[0]]

That's expected, but:

$ curl localhost:10101/index/i1/query -d 'Set(922337203685477580, f1=1)'
{"results":[true]}
$ ./pilosa shard-distribution --host :10101 --index i1
[[0],[0]]

I would expect the shard 9223372036854775808 >> 20 = 8796093022208 in the output. Is that wrong?

yuce avatar Jun 10 '19 12:06 yuce

@yuce there are a few issues that your example raised, all of which I need to address:

  • the http arg getter q.Get("maxShard") is not acting as I expected
  • I need to add a test that exercises the http args
  • for an existing index, the logic is not using availableShards, but rather just 0-maxShard based on the value from api.MaxShards(). This could cause a problem in your example where it will return all shards within that very large range. I'm actually not sure what the user would expect in this case.

And finally, there's a test that is failing in CI which I cannot recreate locally, so I'm still blocked on that.

travisturner avatar Jun 11 '19 04:06 travisturner