dqlite icon indicating copy to clipboard operation
dqlite copied to clipboard

Wire protocol Client message code 2 is missing

Open joseims opened this issue 4 years ago • 16 comments

The wire protocol client message code 2 is missing on the doc, but on the go implementation it is used for a healthcheck, is it missing on purpose? Im beginning to implement the wire protocol in C.

joseims avatar Aug 04 '21 14:08 joseims

Yeah, if you look at src/protocol.h you can see:

#define DQLITE_REQUEST_HEARTBEAT 2

This was meant to be an heartbeat that gets sent by clients on regular basis to show that they are still alive, however it hasn't been implemented yet. You can safely skip it in your client for now.

freeekanayaka avatar Aug 04 '21 14:08 freeekanayaka

I'm having trouble to instantiate only the server side of the dqlite(so i can test my client). Is there some doc that i can read on how to do it? An any other docs about the server side of the dqlite, or even some docs about the go client that i can try to mimic the structure?

joseims avatar Aug 04 '21 20:08 joseims

Just look at the Go dqlite demo and follow the code. Basically you'll want to call the dqlite_node_start() C function (it's done here in the Go code, you'll reach that point if you follow the demo code as mentioned).

freeekanayaka avatar Aug 05 '21 13:08 freeekanayaka

I'm having a hard time to make it work. My code look like this.

I'm using the 1.6 version since it's the one on the ppa. I'm currently unable to say if this works or not because I'm receiving nothing on my client. I also don't know if i need to define or not a connect function with dqlite_node_set_connect_func, because the docs said it was to set a custom function, so can i assume a default exists?

I'm also not sure if my data is properly formatted, but on my client, but i should expect at least to receive an error in the protocol form, it keeps the connection on hold.

Hope you can help me!

Server code

#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <assert.h>
#include <stdint.h>
#include <signal.h>
#include <sys/un.h>
#include <dqlite.h>
#include <raft.h>
#include <sqlite3.h>
#include <sys/socket.h>

//got this func from dqlite/test/lib/server.c
// static int endpointConnect(void *data, 
// 			   const char *address,
// 			   int *fd)
// {
// 	struct sockaddr_un addr;
// 	int rv;
// 	(void)address;
// 	(void)data;
// 	memset(&addr, 0, sizeof addr);
// 	addr.sun_family = AF_UNIX;
// 	strcpy(addr.sun_path + 1, address + 1);
// 	*fd = socket(AF_UNIX, SOCK_STREAM, 0);
// 	// munit_assert_int(*fd, !=, -1);
// 	rv = connect(*fd, (struct sockaddr *)&addr, sizeof(sa_family_t) + strlen(address + 1) + 1);
// 	// munit_assert_int(rv, ==, 0);
// 	return 0;
// }

int main() {
    int a;
    dqlite_node_id id = 5;
    char address[] = "0.0.0.0:9002";
    dqlite_node *node;
    a = dqlite_node_create(id,address, "./",&node);
    a = dqlite_node_set_bind_address(node, address);
    // a = dqlite_node_set_connect_func(node, endpointConnect, NULL);
    a = dqlite_node_set_network_latency(node, 5000000000); //deprecated
    a = dqlite_node_start(node);
    printf("%d\n",a);
    int testInteger;
    scanf("%d", &testInteger);  //just to keep hanging
    return 0;
};

Client code

// Client side C/C++ program to demonstrate Socket programming
#include <stdio.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <string.h>
#include "protocol.h"
#include <stdbool.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include "../dqlite/src/lib/byte.h"
#define PORT 9002

int main(int argc, char const *argv[])
{
	int sock = 0, valread;
	struct sockaddr_in serv_addr;
	uint64_t hello = byte__flip64(1);
	char buffer2[1024] = {0};
	if ((sock = socket(AF_INET, SOCK_STREAM, 0)) < 0)
	{
		printf("\n Socket creation error \n");
		return -1;
	}

	serv_addr.sin_family = AF_INET;
	serv_addr.sin_port = htons(PORT); //indianess
	
	// Convert IPv4 and IPv6 addresses from text to binary form
	if(inet_pton(AF_INET, "0.0.0.0", &serv_addr.sin_addr)<=0)
	{
		printf("\nInvalid address/ Address not supported \n");
		return -1;
	}

	if (connect(sock, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) < 0)
	{
		printf("\nConnection Failed \n");
		return -1;
	}

	int rc = send(sock , &hello , sizeof(hello) , 0 );
	printf("Initial protocol version\n");
	printf("%d\n", rc );
	printf("%d\n",errno);
	printf("instancing buffer\n");
	void *buffer = malloc(16);

	printf("size=%ld\n",sizeof(buffer));
	size_t offset = 0;
	void *pointer = buffer;
	char bit_order[16] = {   0,0x1,0,0 //size 
							,0  //schema
							,0 //revision
							,0,0 //unused
							,0,0,0,0,0,0,0,0x1}; //unused uint64
	printf("putting the bytes on the buffer");
	for (int i=0; i > 16; i++) {
		*(unsigned char*)(pointer) = bit_order[i];
		pointer += sizeof(char);
	}
	printf(":)\n");
	printf("sending message\n");
	rc = send(sock , buffer , 1 , 0 );
	printf("%d\n", rc );
	printf("reading answre\n");
	valread = read(sock , &buffer2, sizeof(buffer2));
	printf("%d\n",valread);
	printf("Oh dear, something went wrong with read()! %s\n", strerror(errno));
	printf("%s\n",buffer2);



	return 0;
}

joseims avatar Aug 11 '21 21:08 joseims

I think i figured it out how to reuse the client part already implemented, but still don't know about the server side. Also, can i assume the already implemented client part work's properly or should i look at it with caution?

joseims avatar Aug 12 '21 20:08 joseims

What do you mean that you don't know about the server side?

The client part implemented in the unit tests should work properly, although it's very primitive and lacks a lot of functionality for real-world usage.

freeekanayaka avatar Aug 13 '21 09:08 freeekanayaka

I don't know if i need to define or not a connect function with dqlite_node_set_connect_func, because the docs said it was to set a custom function, but also said i must run that function before executing the start, so i'm not sure if there is a default function or i should create a new function for it. And if i have to create it, i'm not sure how it should look like.

This is what my server code look's like now, i'm not sure if im going on the right direction with it.

#include <unistd.h>
#include <fcntl.h>
#include <assert.h>
#include <stdint.h>
#include <signal.h>
#include <sys/un.h>
#include <dqlite.h>
#include <raft.h>
#include <sqlite3.h>
#include <sys/socket.h>

//got this func from dqlite/test/lib/server.c
// static int endpointConnect(void *data, 
// 			   const char *address,
// 			   int *fd)
// {
// 	struct sockaddr_un addr;
// 	int rv;
// 	(void)address;
// 	(void)data;
// 	memset(&addr, 0, sizeof addr);
// 	addr.sun_family = AF_UNIX;
// 	strcpy(addr.sun_path + 1, address + 1);
// 	*fd = socket(AF_UNIX, SOCK_STREAM, 0);
// 	// munit_assert_int(*fd, !=, -1);
// 	rv = connect(*fd, (struct sockaddr *)&addr, sizeof(sa_family_t) + strlen(address + 1) + 1);
// 	// munit_assert_int(rv, ==, 0);
// 	return 0;
// }

int main() {
    int a;
    dqlite_node_id id = 5;
    char address[] = "0.0.0.0:9002";
    dqlite_node *node;
    a = dqlite_node_create(id,address, "./",&node);
    a = dqlite_node_set_bind_address(node, address);
    // a = dqlite_node_set_connect_func(node, endpointConnect, NULL);
    a = dqlite_node_set_network_latency(node, 5000000000); //deprecated, but i'm on version 1.6
    a = dqlite_node_start(node);
    printf("%d\n",a);
    int testInteger;
    scanf("%d", &testInteger);  //just to keep hanging
    return 0;
};

joseims avatar Aug 13 '21 12:08 joseims

There should be a default connect function that can connect via TCP using IPv4 addresses. What the docs mean is that if you want to set a custom connect function, then dqlite_node_set_connect_func() must be called before dqlite_node_start().

Please look at the test code for the details of what you need to start the dqlite server engine. What you pasted above seems about right though, there shouldn't be much else needed.

freeekanayaka avatar Aug 19 '21 10:08 freeekanayaka

I was able to create some client code that connected with the go example server. But i was unable to connect to my server. An error appeared when i tried to connect to the socket. I'm felling that there is some server side configuration needed but i can't figure out what it is. Bellow is the server code used and the logs. I tried looking at the go code and the test code, but couldn't figure it out. I also tried to follow the logs but with no success.

My server code (pretty similar to the one above)

int main() {
    dqliteTracingMaybeEnable(true);
    int a;
    dqlite_node_id id = 5;
    char *address = "127.0.0.1:9002";
    char *path = "/home/ubuntu/workspace/dqlite-client/src";
    printf("Address = %s\n", address);
    dqlite_node *node;
    a = dqlite_node_create(id,address, path, &node);
    printf("1: %d\n",a);
    a = dqlite_node_set_bind_address(node, address);
    printf("2: %d\n",a);
    // a = dqlite_node_set_network_latency_ms(node, 5000);
    a = dqlite_node_set_network_latency(node, 5000000000); //deprecated
    printf("3: %d\n",a);
    // a = dqlite_node_set_snapshot_params(node, 2048 , 2048); //dont exist in this version
    a = dqlite_node_start(node);
    printf("4: %d\n",a);
    int testInteger;
    scanf("%d", &testInteger);
    printf("errno = %d\n",errno);
    printf("errno readable = %s\n", strerror(errno));
    return 0;
};

The logs:

Address = 127.0.0.1:9002
LIBDQLITE VfsInit:2084 vfs init
LIBDQLITE raftProxyInit:262 raft proxy init
LIBDQLITE fsm__init:483 fsm init
LIBDQLITE impl_init:44 impl init
1: 0
2: 0
3: 0
LIBDQLITE dqlite_node_start:702 dqlite node start
LIBDQLITE impl_listen:54 impl listen
4: 0
LIBDQLITE conn__start:288 conn start
LIBDQLITE gateway__init:17 gateway init
LIBDQLITE read_protocol_cb:229 read error -4095
LIBDQLITE conn__stop:328 conn stop
LIBDQLITE gateway__close:34 gateway close
exit  #my input
errno = 25
errno readable = Inappropriate ioctl for device

joseims avatar Aug 23 '21 18:08 joseims

I'd suggest looking at the server setup in the test code and trying that, alternatively @MathieuBordere might be able to offer some support.

freeekanayaka avatar Aug 25 '21 13:08 freeekanayaka

I have some good advance, but now i'm stuck on a new problem. I got a cluster of 3 servers, but when i kill one of them (a non leader one), the leader dies and other one get stuck in a trying to reconnect loop. I start to the server, add them to cluster, assign them as voters, open the database, create table, them add two items to the table. These are the logs:

Server 1(first server, node leader):

dqlite node created
dqlite address bound
LIBDQLITE dqlite_node_start:702 dqlite node start
LIBDQLITE impl_listen:54 impl listen
dqlite node started at address: 127.0.0.1:24000
LIBDQLITE conn__start:288 conn start
LIBDQLITE gateway__init:17 gateway init
LIBDQLITE conn__start:288 conn start
LIBDQLITE gateway__init:17 gateway init
LIBDQLITE conn__start:288 conn start
LIBDQLITE gateway__init:17 gateway init
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_add:617 handle add
LIBDQLITE raftChangeCb:600 raft change cb status 0
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_assign:649 handle assign
LIBDQLITE impl_connect:168 impl connect id:2 address:127.0.0.1:25000

LIBDQLITE connect_work_cb:62 connect work cb
LIBDQLITE connect_after_work_cb:138 connect after work cb status 0
LIBDQLITE conn__start:288 conn start
LIBDQLITE gateway__init:17 gateway init
LIBDQLITE raft_connect:110 raft_connect
LIBDQLITE raftProxyAccept:291 raft proxy accept
LIBDQLITE raftChangeCb:600 raft change cb status 0
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_add:617 handle add
LIBDQLITE raftChangeCb:600 raft change cb status 0
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_assign:649 handle assign
LIBDQLITE impl_connect:168 impl connect id:3 address:127.0.0.1:26000

LIBDQLITE connect_work_cb:62 connect work cb
LIBDQLITE connect_after_work_cb:138 connect after work cb status 0
LIBDQLITE conn__start:288 conn start
LIBDQLITE gateway__init:17 gateway init
LIBDQLITE raft_connect:110 raft_connect
LIBDQLITE raftProxyAccept:291 raft proxy accept
LIBDQLITE raftChangeCb:600 raft change cb status 0
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_open:193 handle open
LIBDQLITE db__init:18 db init db1
LIBDQLITE leader__init:113 leader init
LIBDQLITE openConnection:29 open connection filename db1
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_leader:153 handle leader
LIBDQLITE handle_leader:167 handle leader - dispatch to 1
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_prepare:228 handle prepare
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_exec:298 handle exec
LIBDQLITE leader__exec:435 leader exec
LIBDQLITE leader__barrier:473 leader barrier
LIBDQLITE leader__barrier:476 not needed
LIBDQLITE execBarrierCb:418 exec barrier cb status 0
LIBDQLITE VfsPoll:2145 vfs poll filename:db1
LIBDQLITE fsm__apply:222 fsm apply
LIBDQLITE apply_frames:66 fsm apply frames
LIBDQLITE VfsApply:2356 vfs apply filename db1 n 2
LIBDQLITE leaderApplyFramesCb:266 apply frames cb
LIBDQLITE leaderMaybeCheckpoint:173 leader maybe checkpoint
LIBDQLITE leaderMaybeCheckpoint:198 wal size < threshold
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_leader:153 handle leader
LIBDQLITE handle_leader:167 handle leader - dispatch to 1
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_prepare:228 handle prepare
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_exec:298 handle exec
LIBDQLITE leader__exec:435 leader exec
LIBDQLITE leader__barrier:473 leader barrier
LIBDQLITE leader__barrier:476 not needed
LIBDQLITE execBarrierCb:418 exec barrier cb status 0
LIBDQLITE VfsPoll:2145 vfs poll filename:db1
LIBDQLITE fsm__apply:222 fsm apply
LIBDQLITE apply_frames:66 fsm apply frames
LIBDQLITE VfsApply:2356 vfs apply filename db1 n 1
LIBDQLITE leaderApplyFramesCb:266 apply frames cb
LIBDQLITE leaderMaybeCheckpoint:173 leader maybe checkpoint
LIBDQLITE leaderMaybeCheckpoint:198 wal size < threshold
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_leader:153 handle leader
LIBDQLITE handle_leader:167 handle leader - dispatch to 1
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_prepare:228 handle prepare
LIBDQLITE gateway__resume:1031 gateway resume - finished
LIBDQLITE gateway__handle:991 gateway handle
LIBDQLITE handle_exec:298 handle exec
LIBDQLITE leader__exec:435 leader exec
LIBDQLITE leader__barrier:473 leader barrier
LIBDQLITE leader__barrier:476 not needed
LIBDQLITE execBarrierCb:418 exec barrier cb status 0
LIBDQLITE VfsPoll:2145 vfs poll filename:db1
LIBDQLITE fsm__apply:222 fsm apply
LIBDQLITE apply_frames:66 fsm apply frames
LIBDQLITE VfsApply:2356 vfs apply filename db1 n 1
LIBDQLITE leaderApplyFramesCb:266 apply frames cb
LIBDQLITE leaderMaybeCheckpoint:173 leader maybe checkpoint
LIBDQLITE leaderMaybeCheckpoint:198 wal size < threshold
LIBDQLITE gateway__resume:1031 gateway resume - finished

Server 2:

dqlite node created
dqlite address bound
LIBDQLITE dqlite_node_start:702 dqlite node start
LIBDQLITE impl_listen:54 impl listen
dqlite node started at address: 127.0.0.1:26000
LIBDQLITE conn__start:288 conn start
LIBDQLITE gateway__init:17 gateway init
LIBDQLITE raft_connect:110 raft_connect
LIBDQLITE raftProxyAccept:291 raft proxy accept
LIBDQLITE impl_connect:168 impl connect id:1 address:127.0.0.1:24000
LIBDQLITE connect_work_cb:62 connect work cb
LIBDQLITE connect_after_work_cb:138 connect after work cb status 0
LIBDQLITE fsm__apply:222 fsm apply
LIBDQLITE apply_frames:66 fsm apply frames
LIBDQLITE db__init:18 db init db1
LIBDQLITE VfsApply:2356 vfs apply filename db1 n 2
LIBDQLITE fsm__apply:222 fsm apply
LIBDQLITE apply_frames:66 fsm apply frames
LIBDQLITE VfsApply:2356 vfs apply filename db1 n 1
LIBDQLITE fsm__apply:222 fsm apply
LIBDQLITE apply_frames:66 fsm apply frames
LIBDQLITE VfsApply:2356 vfs apply filename db1 n 1

LIBDQLITE impl_connect:168 impl connect id:2 address:127.0.0.1:25000

LIBDQLITE connect_work_cb:62 connect work cb
LIBDQLITE connect_work_cb:79 connect failed to [email protected]:25000

LIBDQLITE connect_after_work_cb:138 connect after work cb status 0
LIBDQLITE impl_connect:168 impl connect id:2 address:127.0.0.1:25000

LIBDQLITE connect_work_cb:62 connect work cb
LIBDQLITE connect_work_cb:79 connect failed to [email protected]:25000

LIBDQLITE connect_after_work_cb:138 connect after work cb status 0
LIBDQLITE impl_connect:168 impl connect id:2 address:127.0.0.1:25000

LIBDQLITE connect_work_cb:62 connect work cb
LIBDQLITE connect_work_cb:79 connect failed to [email protected]:25000

LIBDQLITE connect_after_work_cb:138 connect after work cb status 0
LIBDQLITE impl_connect:168 impl connect id:2 address:127.0.0.1:25000

LIBDQLITE connect_work_cb:62 connect work cb
LIBDQLITE connect_work_cb:79 connect failed to [email protected]:25000

LIBDQLITE connect_after_work_cb:138 connect after work cb status 0
^C # if i did not stopped it, it would keep trying to reconnect until it died

Server 3 (the one that died):

dqlite node created
dqlite address bound
LIBDQLITE dqlite_node_start:702 dqlite node start
LIBDQLITE impl_listen:54 impl listen
dqlite node started at address: 127.0.0.1:25000
LIBDQLITE conn__start:288 conn start
LIBDQLITE gateway__init:17 gateway init
LIBDQLITE raft_connect:110 raft_connect
LIBDQLITE raftProxyAccept:291 raft proxy accept
LIBDQLITE impl_connect:168 impl connect id:1 address:127.0.0.1:24000
LIBDQLITE connect_work_cb:62 connect work cb
LIBDQLITE connect_after_work_cb:138 connect after work cb status 0
LIBDQLITE fsm__apply:222 fsm apply
LIBDQLITE apply_frames:66 fsm apply frames
LIBDQLITE db__init:18 db init db1
LIBDQLITE VfsApply:2356 vfs apply filename db1 n 2
LIBDQLITE fsm__apply:222 fsm apply
LIBDQLITE apply_frames:66 fsm apply frames
LIBDQLITE VfsApply:2356 vfs apply filename db1 n 1
LIBDQLITE fsm__apply:222 fsm apply
LIBDQLITE apply_frames:66 fsm apply frames
LIBDQLITE VfsApply:2356 vfs apply filename db1 n 1
^C #here i stop it

joseims avatar Sep 06 '21 18:09 joseims

And there is another thing that happened that i couldn't explain. When i added the nodes to the cluster as non voter, and them tried to create a table on the leader, nothing happened on the other nodes (no logs after the init). But when i added as voters and did the same thing, they received messages and i could see some log. Is that what is supposed to happen? I thought that even the non voting should receive the data for replication.

joseims avatar Sep 06 '21 18:09 joseims

And there is another thing that happened that i couldn't explain. When i added the nodes to the cluster as non voter, and them tried to create a table on the leader, nothing happened on the other nodes (no logs after the init). But when i added as voters and did the same thing, they received messages and i could see some log. Is that what is supposed to happen? I thought that even the non voting should receive the data for replication.

There are three roles: voter, standby and idle. Only a voter or a standby receives replication data.

freeekanayaka avatar Sep 06 '21 19:09 freeekanayaka

So when i use add, it's added as idle?

joseims avatar Sep 08 '21 12:09 joseims

That's right, see https://github.com/canonical/raft/blob/1bf733d57adeca4c3ff65f70bcedfdd229f8a380/src/client.c#L190, you need to call raft_assign https://github.com/canonical/raft/blob/1bf733d57adeca4c3ff65f70bcedfdd229f8a380/src/client.c#L214 if you wish to change the role.

MathieuBordere avatar Sep 08 '21 14:09 MathieuBordere

Oh, got it, thanks! Additionally, do you have any hint on why all my server are dying when i kill one of them? (that i explained on this comment above)

joseims avatar Sep 08 '21 18:09 joseims