featurebase
featurebase copied to clipboard
Translate store readonly
For bugs, please provide the following:
What's going wrong?
Unable to create/write fields.
What was expected?
Steps to reproduce the behavior
Issue cannot be deterministically reproduced it occasionally happens after restart of one or more nodes in the cluster.
Information about your environment (OS/architecture, CPU, RAM, cluster/solo, configuration, etc.)
/ # uname -srvpio
Linux 4.14.138+ #1 SMP Tue Sep 3 02:58:08 PDT 2019 unknown unknown Linux
2 nodes + 1 coordinator cluster configuration.
4vcpu cores 26GB memory per node
pilosa 2.0 from image pilosa/pilosa@sha256:c9632c248ed8bd08c9aaf164af39277078c14df0c3c5b555d192e5b3321771aa
Description
Writing data fails with a single message, with not much to debug around:
Caused by: java.util.concurrent.ExecutionException: com.pilosa.client.exceptions.PilosaException: Server error (500): translating rows: translate store could not find or create key, translate store read only
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at com.pilosa.client.BitImportManager.run(PilosaClient.java:1070)
... 31 more
Caused by: com.pilosa.client.exceptions.PilosaException: Server error (500): translating rows: translate store could not find or create key, translate store read only
This is happening while the cluster is up and reporting following status:
{
"state": "NORMAL",
"nodes": [
{
"id": "39605e4d-b699-4376-b1c7-706d6dbcdca6",
"uri": {
"scheme": "http",
"host": "10.0.98.4",
"port": 10101
},
"isCoordinator": false,
"state": "READY"
},
{
"id": "7304edfb-a659-457c-a7e9-d44b3cf6ff37",
"uri": {
"scheme": "http",
"host": "pilosa-0.pilosa.pilosa.svc.cluster.local",
"port": 10101
},
"isCoordinator": true,
"state": "READY"
},
{
"id": "7baaeb45-4604-4731-b91d-7c4d3e83d580",
"uri": {
"scheme": "http",
"host": "10.0.96.4",
"port": 10101
},
"isCoordinator": false,
"state": "READY"
}
],
"localID": "7baaeb45-4604-4731-b91d-7c4d3e83d580"
}
Success criteria (What criteria will consider this ticket closeable?)
when the cluster is in ready state i should be able to write data.
edit: while this is happening there is nothing in pilosa log except "holder sync beginning" - "holder sync complete" every 10 minutes
some more messages from log that may or may not be relevant:
2020/06/05 03:26:52 cannot replicate: nodeURL=http://10.0.97.10:10101 err=http: invalid translate store endpoint status: code=500 url=http://10.0.97.10:10101/internal/translate/data body="index not found"
our "solution" at this time is: keep restarting until it works, which i'd love to change. If possible i'd like to also know:
- why is the translate store read only? what triggered this.
- can i see this in some sort of status query, before actually trying to write to index.
- can i - as a client connected to server - issue a request to re-evaluate what ever is causing the translatestore to be readonly
@baaa I'm not completely confident in my understanding of this version of Pilosa and the java client, and I haven't set it up locally to be sure, but I can explain generally what the translate store is doing and why you might be seeing that error.
The translate store is used when either your index or field(s) are configured to use string keys. In this case, there is a translate store which is allocating column and/or row ID's for the given string keys. In this version of Pilosa, that ID allocation is not distributed among the nodes in the cluster; a single node is responsible for handling ID allocation, and I believe I'm correct in saying that node is the one configured to be the coordinator. In order to avoid that node being a bottleneck for reads, the translation store on the coordinator is replicated around the cluster. Those replicas, meant only for read queries, are read only. When you see the error translate store read only
, it means that you're trying to write data that contains (new) string keys to one of the replicas instead of to the coordinator.
The java client should be smart enough to know not to send writes needing key translation to any node but the coordinator, but I'm not 100% sure about that. You could try configuring your client to just be aware of the coordinator node (in your example: pilosa-0.pilosa.pilosa.svc.cluster.local
), and confirm that that prevents the read-only error. I'll have to dig into it more to see exactly what that java client is doing with the writes needing key translation.
@travisturner thanks for the reply, apologies for raising so many issues :) I got one more to raise soon.
As far as I can see from the java client it is smart enough so I have doubts that we are trying to create new keys on anywhere other than coordinator. At the moment we are providing a single url to create a pilosa client which is the cluster.
Next time I see this error, I am going to explicitly give the coordinator url to indexers and see if that resolves the issue. If it does resolve than this implies we, on our code, have to create clients differently for read and write operations. Again if this is the case, it'd be very peculiar behaviour since this only happens after cluster restarts.
I will monitor and update this issue.
FYI the reason we are forced to use a build from master/2.0 is this issue: https://github.com/pilosa/pilosa/issues/2083 It is marked as resolved but is not released in any 1.4 branches, makes pilosa 1.4 very much unusable.
Thanks @baaa for letting me know. We'll work on cutting a new release so you don't have to build from master.