dgraph
dgraph copied to clipboard
[How to use regexp in @filter]: <regexp use>
Question.
I had about 50million nodes,if I use ‘regexp’ like “{ resources(func:regexp(name, /abc/i) )@filter(eq(workspace_key, "def")) { id name resource_key } }”, it will be fast with in 1 second,however, if I use it like "{ resources(func: eq(workspace_key, "def"))@filter(regexp(name, /abc/i)) { id name resource_key } }", it will be useless and overtime. Most of the time I have to spell out a complex logical combination, so it's impossible to put a regexp in "func" but filter. So I really wanna know whether the index is invalid when I use 'regexp' in @filter? And if somebody has a good idea to solve this problem?
We don't have a query planner yet, Dgraph may not do a good job in figuring out in what order to execute the filters. Is it possible for you to share the data? It may still be worth looking into this.
`package main
import ( "context" "encoding/json" "fmt" "github.com/dgraph-io/dgo" "github.com/dgraph-io/dgo/protos/api" "google.golang.org/grpc" "strconv" "sync" )
func nodeExists(dg *dgo.Dgraph, id string) (bool, error) {
query := fmt.Sprintf({ data(func: eq(id, "%s")) { uid } }
, id)
resp, err := dg.NewTxn().Query(context.Background(), query)
if err != nil {
return false, err
}
var result struct {
Data []struct {
UID string json:"uid"
} json:"data"
}
if err := json.Unmarshal(resp.Json, &result); err != nil {
return false, err
}
return len(result.Data) > 0, nil
}
func saveNode(dg *dgo.Dgraph, j int) error {
id := "test_text" + strconv.Itoa(j)
name := name
+ id
node := map[string]interface{}{
"name": name,
"workspace_key": "_xGWV7",
"create_by": "xxw" + strconv.Itoa(j),
"qa": "wws",
"id": id,
}
ctx := context.Background()
nodeJSON, err := json.Marshal(node)
if err != nil {
return err
}
mutation := &api.Mutation{
CommitNow: true,
SetJson: nodeJSON,
}
_, err = dg.NewTxn().Mutate(ctx, mutation)
if err != nil {
return err
}
fmt.Printf("Created node %s \n", id)
return nil
}
func saveRelation(dg *dgo.Dgraph, srcID, dstID string) error {
addRelationQuery := { left as var(func: eq(id, "%s")) right as var(func: eq(id, "%s")) }
addQuads := fmt.Sprintf(uid(left) <%s> uid(right) .
, "test_parent_to_child")
addReq := &api.Request{
CommitNow: true,
Query: fmt.Sprintf(addRelationQuery, srcID, dstID),
Mutations: []*api.Mutation{
&api.Mutation{
SetNquads: []byte(addQuads),
},
},
}
_, err := dg.NewTxn().Do(context.Background(), addReq)
if err != nil {
return err
}
return nil
}
func main() { conn, err := grpc.Dial("127.0.0.1:9080", grpc.WithInsecure()) if err != nil { fmt.Println("Error connecting to Dgraph server:", err) return } defer conn.Close()
dg := dgo.NewDgraphClient(api.NewDgraphClient(conn))
concurrency := 100
totalNodes := 1000000
startIndex := 0
nodeBatchSize := (totalNodes - startIndex) / concurrency
var wg sync.WaitGroup
wg.Add(concurrency)
for i := 0; i < concurrency; i++ {
go func(workerID int) {
defer wg.Done()
start := startIndex + workerID*nodeBatchSize
end := start + nodeBatchSize
if workerID == concurrency-1 {
end = totalNodes
}
for j := start; j < end; j++ {
//id := "wsc_test_text" + strconv.Itoa(j)
//if isExist, _ := nodeExists(dg, id); !isExist {
err = saveNode(dg, j)
if err != nil {
fmt.Println("Error saving node:", err)
return
}
//}
}
}(i)
}
//
wg.Wait()
fmt.Println("Node creation completed!")
} ` It's my easy demo to create data, you can change "totalNodes " to decide how many nodes you want to generate. And I set predicate "workspace_key" with "exact" and others set "trigram" and "term".
Definitely a problem. Regexp in filter should use the index. See also this discuss thread for same issue: https://discuss.dgraph.io/t/how-to-use-regexp-in-filter/18901/8 .
My guess is that when regexp was allowed in the filter, someone mistakenly or naively thought it would not need to be indexed. So it's a feature that you can post-filter with regexp, but misleading that it does not use the filter. I suspect GraphQL will build this underlying (non-optimized) DQL structure under the hood, making this more impactful for those users.
This issue has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.