dgraph icon indicating copy to clipboard operation
dgraph copied to clipboard

[How to use regexp in @filter]: <regexp use>

Open WSC998 opened this issue 1 year ago • 3 comments

Question.

I had about 50million nodes,if I use ‘regexp’ like “{ resources(func:regexp(name, /abc/i) )@filter(eq(workspace_key, "def")) { id name resource_key } }”, it will be fast with in 1 second,however, if I use it like "{ resources(func: eq(workspace_key, "def"))@filter(regexp(name, /abc/i)) { id name resource_key } }", it will be useless and overtime. Most of the time I have to spell out a complex logical combination, so it's impossible to put a regexp in "func" but filter. So I really wanna know whether the index is invalid when I use 'regexp' in @filter? And if somebody has a good idea to solve this problem?

WSC998 avatar Sep 09 '23 06:09 WSC998

We don't have a query planner yet, Dgraph may not do a good job in figuring out in what order to execute the filters. Is it possible for you to share the data? It may still be worth looking into this.

mangalaman93 avatar Sep 09 '23 12:09 mangalaman93

`package main

import ( "context" "encoding/json" "fmt" "github.com/dgraph-io/dgo" "github.com/dgraph-io/dgo/protos/api" "google.golang.org/grpc" "strconv" "sync" )

func nodeExists(dg *dgo.Dgraph, id string) (bool, error) { query := fmt.Sprintf({ data(func: eq(id, "%s")) { uid } }, id) resp, err := dg.NewTxn().Query(context.Background(), query) if err != nil { return false, err } var result struct { Data []struct { UID string json:"uid" } json:"data" } if err := json.Unmarshal(resp.Json, &result); err != nil { return false, err } return len(result.Data) > 0, nil }

func saveNode(dg *dgo.Dgraph, j int) error { id := "test_text" + strconv.Itoa(j) name := name + id node := map[string]interface{}{ "name": name, "workspace_key": "_xGWV7", "create_by": "xxw" + strconv.Itoa(j), "qa": "wws", "id": id, }

ctx := context.Background()

nodeJSON, err := json.Marshal(node)
if err != nil {
    return err
}

mutation := &api.Mutation{
    CommitNow: true,

    SetJson: nodeJSON,
}

_, err = dg.NewTxn().Mutate(ctx, mutation)
if err != nil {
    return err
}
fmt.Printf("Created  node %s \n", id)
return nil

}

func saveRelation(dg *dgo.Dgraph, srcID, dstID string) error { addRelationQuery := { left as var(func: eq(id, "%s")) right as var(func: eq(id, "%s")) } addQuads := fmt.Sprintf(uid(left) <%s> uid(right) . , "test_parent_to_child") addReq := &api.Request{ CommitNow: true, Query: fmt.Sprintf(addRelationQuery, srcID, dstID), Mutations: []*api.Mutation{ &api.Mutation{ SetNquads: []byte(addQuads), }, }, } _, err := dg.NewTxn().Do(context.Background(), addReq) if err != nil { return err }

return nil

}

func main() { conn, err := grpc.Dial("127.0.0.1:9080", grpc.WithInsecure()) if err != nil { fmt.Println("Error connecting to Dgraph server:", err) return } defer conn.Close()

dg := dgo.NewDgraphClient(api.NewDgraphClient(conn))

concurrency := 100
totalNodes := 1000000
startIndex := 0
nodeBatchSize := (totalNodes - startIndex) / concurrency

var wg sync.WaitGroup
wg.Add(concurrency)

for i := 0; i < concurrency; i++ {
    go func(workerID int) {
        defer wg.Done()

        start := startIndex + workerID*nodeBatchSize
        end := start + nodeBatchSize
        if workerID == concurrency-1 {
            end = totalNodes
        }

        for j := start; j < end; j++ {
            //id := "wsc_test_text" + strconv.Itoa(j)
            //if isExist, _ := nodeExists(dg, id); !isExist {
            err = saveNode(dg, j)
            if err != nil {
                fmt.Println("Error saving node:", err)
                return
            }
            //}
        }
    }(i)
}
//
wg.Wait()
fmt.Println("Node creation completed!")

} ` It's my easy demo to create data, you can change "totalNodes " to decide how many nodes you want to generate. And I set predicate "workspace_key" with "exact" and others set "trigram" and "term".

WSC998 avatar Sep 11 '23 00:09 WSC998

Definitely a problem. Regexp in filter should use the index. See also this discuss thread for same issue: https://discuss.dgraph.io/t/how-to-use-regexp-in-filter/18901/8 .

My guess is that when regexp was allowed in the filter, someone mistakenly or naively thought it would not need to be indexed. So it's a feature that you can post-filter with regexp, but misleading that it does not use the filter. I suspect GraphQL will build this underlying (non-optimized) DQL structure under the hood, making this more impactful for those users.

damonfeldman avatar Sep 15 '23 14:09 damonfeldman

This issue has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.

github-actions[bot] avatar Jul 13 '24 01:07 github-actions[bot]