gocql icon indicating copy to clipboard operation
gocql copied to clipboard

Bad Behaviour in New v1.2.0 Release?

Open steve-gray opened this issue 3 years ago • 7 comments

What version of Cassandra are you using?

ScyllaDB 5.0 (but same all the way down to 4.0 at each major point release, which is as far as we test back)

What version of Gocql are you using?

Latest/1.20

What version of Go are you using?

1.18

What did you do?

Our CI scripts wait for port 9042 to open, then start creating the test databases. It seems that in CI the new release fails completely, consistently (regardless of adding extra delays/sleeps - even many minutes). The code fails during startup completely. However, if we go back to the prior release or switch over to the ScyllaDB fork specifically, the issue goes away.

What did you expect to see?

Our CI tests should pass as usual.

What did you see instead?

=== RUN   TestUpsertModel
    operations_model_test.go:20: Using test keyspace: unittest_model_manager_1657252471641166683
    operations_model_test.go:30: 
        	Error Trace:	operations_model_test.go:30
        	Error:      	Expected nil, but got: &fmt.wrapError{msg:"no connections were made when creating the session", err:(*fmt.wrapError)(0xc00048b1c0)}

steve-gray avatar Jul 08 '22 04:07 steve-gray

Hi @steve-gray ! Thanks for the report.

It is interesting that the CI for gocql passed.

I just tried to reproduce the issue locally and I see the error with a program like this:

package main

import (
	"fmt"
	"log"
	"os"

	"github.com/gocql/gocql"
)

func main() {
	logger := log.New(os.Stderr, "> ", 0)
	err := mainErr(logger)
	if err != nil {
		logger.Fatal(err)
	}
}

func mainErr(logger *log.Logger) error {
	cfg := gocql.ClusterConfig{
		Hosts:        []string{"localhost:9042"},
		ProtoVersion: 4,
		Logger:       logger,
	}
	session, err := cfg.CreateSession()
	if err != nil {
		return err
	}
	defer session.Close()
	it := session.Query("SELECT host_id from system.peers").Iter()
	var hostID string
	for it.Scan(&hostID) {
		fmt.Println(hostID)
	}
	return it.Close()
}

However, it happens to me with both v1.1.0 and v1.2.0, which is suspicious.

martin-sucha avatar Jul 11 '22 11:07 martin-sucha

It's good that it's reproducible at least. We thought we were going insane as its inverted in our scenario: CI doesn't work but local testing does.

steve-gray avatar Jul 12 '22 05:07 steve-gray

Hi @martin-sucha - is there a workaround for this besides switching over to the Scylla fork of GoCQL?

steve-gray avatar Jul 25 '22 02:07 steve-gray

Hi @martin-sucha,

Your recent commit r/e system peers in master - is that a potential attempt to resolve this? Just wondering if it's worth trying it out or if it's unrelated.

-Steve

steve-gray avatar Aug 28 '22 03:08 steve-gray

Hi!

If you mean #1646, that is unrelated. #1646 is a fix for DataStax Enterprise.

I used select from system.peers in the reproducer only because it is a table that already exists, any select or other query that could be there instead.

I haven't looked into this issue (#1640) yet as I had lot of other stuff to handle, sorry.

Could you maybe try to git bisect which commit introduced the issue? That might help narrow down the cause. Just to confirm, as it's not explicitly stated above, did v1.1.0 work for you or have you used a different version previously?

martin-sucha avatar Aug 29 '22 17:08 martin-sucha

This issue appears to have infiltrated the ScyllaDB fork of the driver. We stepped up from 1.6 to 1.7.2 and found the issue now occurs there too. Reversion back to 1.6 seems to negate/resolve the issue. That should hopefully narrow it down to a very specific window?

steve-gray avatar Sep 29 '22 23:09 steve-gray