databricks-sql-go icon indicating copy to clipboard operation
databricks-sql-go copied to clipboard

Data race of reading row value of timestamp data type

Open 7phs opened this issue 1 year ago • 3 comments

Hello,

I found a data race with reading data from Databricks simultaneously. For example, from different tables or reading catalogs metadata simultaneously.

A code to reproduce data race is

import (
	"context"
	"database/sql"
	"fmt"
	"sync"
	"testing"

	dbsql "github.com/databricks/databricks-sql-go"
)

func TestDatabricksDataRace(t *testing.T) {
	var (
		wg         sync.WaitGroup
		ctx        = context.Background()
		w          = make(chan bool)
		routineNum = 10
	)

	wg.Add(routineNum)

	for i := 0; i < routineNum; i++ {
		go func() {
			defer wg.Done()

			<-w

			if err := listCatalogs(ctx); err != nil {
				fmt.Println("ERROR:", err)
			}
		}()
	}

	close(w)

	wg.Wait()
}

func listCatalogs(ctx context.Context) error {
	connector, err := dbsql.NewConnector(
		dbsql.WithServerHostname(cfg.Host),
		dbsql.WithPort(int(cfg.Port)),
		dbsql.WithAccessToken(cfg.AccessToken),
		dbsql.WithHTTPPath(cfg.HTTPPath),
	)
	if err != nil {
		return err
	}

	db := sql.OpenDB(connector)
	defer db.Close()

	for i := 0; i < 10; i++ {
		err := func() error {
			r, err := db.QueryContext(ctx, "SHOW CATALOGS;")
			if err != nil {
				return err
			}
			defer r.Close()

			for r.Next() {
				var s string
				if err := r.Scan(&s); err != nil {
					return err
				}
			}

			return nil
		}()

		if err != nil {
			return err
		}
	}

	return nil
}

A command to run this test with data race detector:

go test -race -run TestDatabricksDataRace .

A root cause of data race is not initialised field loc of arrow.TimestampType. It initialised in the first call of function arrow.TimestampType :: GetZone().

A workaround of data race is:

func init() {
   // init `arrow.TimestampType` before use it.
    _, _ = arrow.FixedWidthTypes.Timestamp_us.(*arrow.TimestampType).GetToTimeFunc()
}

Environment:

  • go v1.21
  • databricks-sql-go v1.5.2

7phs avatar Nov 20 '23 16:11 7phs

Related issue of Apache Arrow - https://github.com/apache/arrow/issues/38795

7phs avatar Nov 20 '23 17:11 7phs

databricks-sql-go updated and uses Apache Arrow Go v16 with fixed data race.

This bug is fixed now.

7phs avatar May 29 '24 09:05 7phs

A bug of a data race still exists with reverting the Apache Arrow version to v12.

7phs avatar Jun 05 '24 05:06 7phs