opa icon indicating copy to clipboard operation
opa copied to clipboard

json.match_schema() does not work reliable with array input and schema

Open mscudlik opened this issue 1 year ago • 0 comments

Short description

json.match_schema() does not work reliable with array input and schema

Steps To Reproduce

I have created a small example: https://github.com/mscudlik/opa-json-match-schema

Expected behavior

In my example, the command

opa eval -b . -f pretty data.array_input.policy --input array_input/input/valid/data.json

should return matches true, and the test https://github.com/mscudlik/opa-json-match-schema/blob/main/array_input/test/string_regex_pattern_test.rego

There should also be error messages in case the schema is not valid

Somehow

opa eval -b . -f pretty data.array_input.policy --input array_input/input/invalid_no_array/data.json

works with the expected error message, while

opa eval -b . -f pretty data.array_input.policy --input array_input/input/invalid_type/data.json

produces the expected matches value, but the errors are missing

Additional context

Usecase is to validate the input (which i cannot trust)

opa version

Version: 0.62.0
Build Commit: 1d0ab93822e83a4165c78372a7fb4c05e14a8bca
Build Timestamp: 2024-02-29T17:07:11Z
Build Hostname: Mac-1709226273754.local
Go Version: go1.22.0
Platform: darwin/amd64
WebAssembly: available

mscudlik avatar Mar 06 '24 09:03 mscudlik

I think that this feature in general is important but we need to fix error handling. So the challenge here is not the DX when all goes well (I.e., no brainer to have this) but when an async vectorization fails (e.g., something as simple as an incorrect API key or rate limiting on the model provider’s side)

bobvanluijt avatar Apr 03 '24 04:04 bobvanluijt

This is an interesting feature, but I agree that errors with async present a more obscure problem space for DX. In the synchronous world, the feedback loop is tight.

It would be nice if all the known errors that occur provide users with detailed information on the resolution. In some places, it is done with links, but those links need to land in CI for continuous validation. In other cases, improved error handling is done with code snippets that lead users to resolve the problem if it is in their control. However, you will know the best way to handle the known errors such that they lead to a speedy resolution and feel as if the feedback loop is almost as tight as a synchronous thread of execution.

Of course, there will still be errors that are not known or non-fatal (in the sense that they do not kill a query's execution). In those cases, users may favor a lexical fallback and a warning log. If you do consider a lexical fallback, it's very important to be squeaky about it (log) so that users can look into it if they choose.

Uptime reigns supreme.

MarcusSorealheis avatar Apr 03 '24 11:04 MarcusSorealheis

One way we could make errors more observable might be to add a DLQ and retries to batches. If a batch hits a specified number of retries it would land on the DLQ with single error associated. So in pseudo something like:

type Batch struct {
	ID          string    
	Jobs        []Job     
	ProcessTime time.Time 
	Error       error     
	RetryCount  int       
}

type Job struct {
	ID   string 
	Data string 
}

type FailedBatch struct {
	Batch Batch
	Err   error
}

type DeadLetterQueue []FailedBatch


func ProcessBatch(batch Batch, maxRetries int, dlq *DeadLetterQueue) {
	// For simplicity, randomly assign an error to a specific batch ID
	if batch.ID == "errorBatch" {
		batch.Error = fmt.Errorf("processing error")
	}

	if batch.Error != nil {
		if batch.RetryCount >= maxRetries {
			*dlq = append(*dlq, FailedBatch{Batch: batch, Err: batch.Error})
			return
		}

		batch.RetryCount++

		// Some sort of backoff strategy
		backoff := time.Duration(2^batch.RetryCount) * time.Second
		time.Sleep(backoff)

		ProcessBatch(batch, maxRetries, dlq)
		return
	}
}


func main() {
	var dlq DeadLetterQueue

	batches := []Batch{
		{ID: "batch1", Jobs: []Job{{ID: "job1", Data: "data1"}}, ProcessTime: time.Now()},
		{ID: "errorBatch", Jobs: []Job{{ID: "job2", Data: "data2"}}, ProcessTime: time.Now()},
	}

	maxRetries := 3

	for _, batch := range batches {
		ProcessBatch(batch, maxRetries, &dlq)
	}

}

The queues themselves would be on disk in practice rather than in memory like this but the idea could still be the same. I think having a DLQ with retries would help with errors that are rate limit related, and for errors that are due to other reasons we would have a top level error message that could be accessed for each failed batch to see what has happened.

cdpierse avatar Apr 03 '24 11:04 cdpierse