zipkin icon indicating copy to clipboard operation
zipkin copied to clipboard

production-relevant cassandra schema

Open codefromthecrypt opened this issue 8 years ago • 9 comments

from @mikewrighton

hi, anyone know if there’s an easy way to modify the cassandra schema? I’d like to change the replication factor which is fixed at 1 in cassandra-schema-cql3.

It's currently tribal knowledge that the built-in schema isn't ideal for all production environments. We take some steps that make it easier for tests to pass, etc.

It would be nice for users and also for benchmarkers to use more realistic schema settings.

@openzipkin/cassandra do you know of a list of things about the schema that would certainly need to change in a multi-node cassandra cluster in production? If you can enumerate them, I can help document and maybe we can brainstorm a "dev mode" flag or some such that makes test-level options not the default.

codefromthecrypt avatar Jul 20 '16 01:07 codefromthecrypt

We added off-heap memtable allocation of 20G which reduced the # of flushes and resulted in lesser compaction.

prat0318 avatar Jul 20 '16 06:07 prat0318

I guess I was thinking that since there is some useful code around the schema loading, like in zipkin.storage.cassandra.Schema, it might be good if it were somehow extensible e.g. if you could provide your own schema or 'upgrade schema' file, and/or modify some of the parameters in the default schema like replication factor.

mikewrighton avatar Jul 20 '16 14:07 mikewrighton

The only way to do this is somewhat basic.. put said file in front of the classpath!

codefromthecrypt avatar Jul 21 '16 01:07 codefromthecrypt

in the case of docker you'd overwrite the file at /zipkin/cassandra-schema-cql3.txt

doing arbitrary upgrades could be dodgy. there's careful logic about the upgrade, and it checks for very certain things because CQL can't do everything. A log message might be misleading if we used this check, but did something else.

ex. "/cassandra-schema-cql3-upgrade-1.txt" has this check

static boolean hasUpgrade1_defaultTtl(KeyspaceMetadata keyspaceMetadata) { // TODO: we need some approach to forward-check compatibility as well. // backward: this code knows the current schema is too old. // forward: this code knows the current schema is too new. return keyspaceMetadata.getTable("traces").getOptions().getDefaultTimeToLive()

0; }

We have tests to show the effects of this work etc, but arbitrary things aren't something we could promise and therefore unlikely to be able to support.

I'd recommend only replacing the semantic contents of the existing schema files for this reason. Also, there's a lot of folks who use cassandra.. maybe there are other tools available to keep schema up to date which don't require zipkin's ENSURE_SCHEMA feature?

codefromthecrypt avatar Jul 21 '16 01:07 codefromthecrypt

Increasing RF to 3+ is important in production.

But I don't know what's best way to do that without breaking dev environments. Currently there is the warning printed, ref https://github.com/openzipkin/zipkin/blob/master/zipkin-storage/cassandra/src/main/java/zipkin/storage/cassandra/Schema.java#L43

michaelsembwever avatar Jul 28 '16 02:07 michaelsembwever

Other important things to do to a problem environment are

  • disable assertions (remove "-ea" from cassandra-env.sh)
  • run Java8 and G1GC
  • have all Cassandra and Zipkin servers sync regularly against an internal ntp server
  • enable cross_node_timeout in cassandra.yaml
  • disable swap
  • unlimited ulimits

michaelsembwever avatar Jul 28 '16 03:07 michaelsembwever

@adriancole is this still relevant ? If yes i can search around the issues and put up some 'hints' in the documentation like above, as well as a warning about the provided Cassandra schema that sites should really not rely on the 'demo' schema configuration we provide. We should not become Cassandra tweaking experts though, merely hint that sites are responsible for squeezing the most out of their storage, and we'll just tell them what is important in terms of zipkin storage and indexing needs.

If not and all this is hopelessly outdated, feel free to close :-)

jorgheymans avatar Oct 17 '20 19:10 jorgheymans

we could probably handle replication factor as an ENV variable as we do in elasticsearch, and leave it at that for now.

codefromthecrypt avatar Oct 18 '20 01:10 codefromthecrypt

allright i can give that a go after you landed the DataStax Driver 4.0 Mothership https://github.com/openzipkin/zipkin/pull/3246

jorgheymans avatar Oct 18 '20 15:10 jorgheymans