go-driver icon indicating copy to clipboard operation
go-driver copied to clipboard

[FEATURE] Get raw result from cursor and add a Next method to a cursor to iterate results batch by batch

Open charlyjna opened this issue 2 years ago • 4 comments

Hi,

I just realize that querying ArangoDB is limited to 1,000 records. How may I change it to 'unlimited'?

Thanks.

Charly

charlyjna avatar Aug 10 '23 13:08 charlyjna

Please use Cursor to iterate over batches: https://www.arangodb.com/docs/3.11/appendix-references-cursor-object.html

Here is an example from our tests in V2: https://github.com/arangodb/go-driver/blob/9907b87d5fb88a74cbc69e97bb5cf2eb9ac5e98b/v2/tests/database_query_test.go#L144-L180

jwierzbo avatar Aug 10 '23 13:08 jwierzbo

I am using cursor but, in this case, I want to get raw JSON. I created a generic query execution method that return directly the result:

func (a *ArangoDB) ExecuteQuery(query string) (string, error) {
	errMsg := ""
	if cur, err := a.printCollection(query); err != nil {
		errMsg = fmt.Sprintf("Error while executing query 'query': %v", err)
		logger.Errorf("ArangoDB - %s", errMsg)
	} else {
		if cur != nil {
			defer (*cur).Close()
			if res, err := json.Marshal(cur); err != nil {
				errMsg = fmt.Sprintf("Cannot convert result to JSON: %v", err)
				logger.Errorf("ArangoDB - %s", errMsg)
			} else {
				result := make(map[string]json.RawMessage)
				if err := json.Unmarshal(res, &result); err != nil {
					errMsg = fmt.Sprintf("Cannot extract from JSON: %v", err)
					logger.Errorf("ArangoDB - %s", errMsg)
				} else {
					if raw, ok := result["result"]; ok && raw != nil {
						return string(raw), nil
					} else {
						return "", nil
					}
				}
			}
		}
	}
	return "", fmt.Errorf(errMsg)
} ```

In your example I understand that I should iterate the cursor, but in this case I should know which struct to use as a result recipient, isn't it?

charlyjna avatar Aug 10 '23 13:08 charlyjna

Ok, I read deeper the cursor readDocument method to understand how it works. What I did is adding 2 methods:

// Return raw batch result
func (c *cursor) GetRawData() []*RawObject {
    return c.cursorData.Result
}
// A method based on readDocument but only to fetch the next batch
func (c *cursor) Next(ctx context.Context) error {
   ...
}

Now, I can use a generic method to execute a query and return a raw JSON file instead of iterating record by record.

I updated the title because it could be nice to have these methods in standard.

charlyjna avatar Aug 10 '23 17:08 charlyjna

To explain more what I did:

First, I added these 2 methods in cursor_impl.go:

// Return raw batch result
func (c *cursor) GetRawData() []*RawObject {
	return c.cursorData.Result
}
// A method based on readDocument but only to fetch the next batch
func (c *cursor) Next(ctx context.Context, retryBatchID string) error {
	if c.resultIndex < 0 {
		return WithStack(NoMoreDocumentsError{})
	}

	// Force use of initial endpoint
	ctx = WithEndpoint(ctx, c.endpoint)
	if len(c.cursorData.NextBatchID) > 0 {
		c.NextBatchID = c.cursorData.NextBatchID
	}

	// This is required since we are interested if this was a dirty read,
	// but we do not want to trash the users bool reference.
	var wasDirtyRead bool
	fetchCtx := ctx
	if c.allowDirtyReads {
		fetchCtx = WithAllowDirtyReads(ctx, &wasDirtyRead)
	}

	p := path.Join(c.relPath(), c.cursorData.ID)

	// If we have a NextBatchID, use it
	if c.NextBatchID != "" {
		p = path.Join(c.relPath(), c.cursorData.ID, c.NextBatchID)
	}

	// We have to retry the batch instead of fetching the next one
	if retryBatchID != "" {
		p = path.Join(c.relPath(), c.retryData.cursorID, retryBatchID)
	}

	// Update currentBatchID before fetching the next batch (no retry case)
	if c.NextBatchID != "" && retryBatchID == "" {
		c.retryData.currentBatchID = c.NextBatchID
	}

	// Fetch the next batch
	req, err := c.conn.NewRequest("POST", p)
	if err != nil {
		return WithStack(err)
	}

	cs := applyContextSettings(fetchCtx, req)
	resp, err := c.conn.Do(fetchCtx, req)
	if err != nil {
		return WithStack(err)
	}
	if err := resp.CheckStatus(200); err != nil {
		return WithStack(err)
	}
	loadContextResponseValues(cs, resp)
	var data cursorData
	if err := resp.ParseBody("", &data); err != nil {
		return WithStack(err)
	}
	c.cursorData = data
	c.resultIndex = 0
	c.lastReadWasDirty = wasDirtyRead

	if !c.cursorData.HasMore {
		// Out of data
		c.resultIndex = -1
	}

	return nil
}

Then I use this code to query the database and get a result in JSON format:

...
if cur != nil {
	defer (*cur).Close()
	var raw []*driver.RawObject
	for {
		raw = append(raw, (*cur).GetRawData()...)
		if !(*cur).HasMore() {
			break
		}
		if err := (*cursor).Next(context.Background(), ""); err != nil {
			logger.Errorf("Cannot fetch next batch: %v", err)
			break
		}
	}
}
...

Do you think you could add these methods (GetRawData() and Next()) in a next release?

I wrote a small web server to be able to query ArangoDB from any external application using a POST query and get a JSON result. It is very helpful and very fast this way for example with Grafana or PowerBI.

Thank you.

Charly

charlyjna avatar Aug 24 '23 12:08 charlyjna