databricks-sql-go
databricks-sql-go copied to clipboard
Data race of reading row value of timestamp data type
Hello,
I found a data race with reading data from Databricks simultaneously. For example, from different tables or reading catalogs metadata simultaneously.
A code to reproduce data race is
import (
"context"
"database/sql"
"fmt"
"sync"
"testing"
dbsql "github.com/databricks/databricks-sql-go"
)
func TestDatabricksDataRace(t *testing.T) {
var (
wg sync.WaitGroup
ctx = context.Background()
w = make(chan bool)
routineNum = 10
)
wg.Add(routineNum)
for i := 0; i < routineNum; i++ {
go func() {
defer wg.Done()
<-w
if err := listCatalogs(ctx); err != nil {
fmt.Println("ERROR:", err)
}
}()
}
close(w)
wg.Wait()
}
func listCatalogs(ctx context.Context) error {
connector, err := dbsql.NewConnector(
dbsql.WithServerHostname(cfg.Host),
dbsql.WithPort(int(cfg.Port)),
dbsql.WithAccessToken(cfg.AccessToken),
dbsql.WithHTTPPath(cfg.HTTPPath),
)
if err != nil {
return err
}
db := sql.OpenDB(connector)
defer db.Close()
for i := 0; i < 10; i++ {
err := func() error {
r, err := db.QueryContext(ctx, "SHOW CATALOGS;")
if err != nil {
return err
}
defer r.Close()
for r.Next() {
var s string
if err := r.Scan(&s); err != nil {
return err
}
}
return nil
}()
if err != nil {
return err
}
}
return nil
}
A command to run this test with data race detector:
go test -race -run TestDatabricksDataRace .
A root cause of data race is not initialised field loc of arrow.TimestampType. It initialised in the first call of function arrow.TimestampType :: GetZone().
A workaround of data race is:
func init() {
// init `arrow.TimestampType` before use it.
_, _ = arrow.FixedWidthTypes.Timestamp_us.(*arrow.TimestampType).GetToTimeFunc()
}
Environment:
- go v1.21
- databricks-sql-go v1.5.2
Related issue of Apache Arrow - https://github.com/apache/arrow/issues/38795
databricks-sql-go
updated and uses Apache Arrow Go v16 with fixed data race.
This bug is fixed now.
A bug of a data race still exists with reverting the Apache Arrow version to v12
.