go-cloud
go-cloud copied to clipboard
docstore/awsdynamodb panics on iter.Next()
Describe the bug
Calling iter.Next
on a docstore/awsdynamodb
derived Query
instance panics with an "index out of range" error along the lines of:
panic: runtime error: index out of range [0] with length 0
goroutine 98 [running]:
gocloud.dev/docstore/awsdynamodb.(*documentIterator).Next(0x14000056050, {0x101ba36b0?, 0x10258bb40?}, {{0x101b386e0, 0x1400019bea0}, 0x0, {0x101b454e0, 0x1400019bea0, 0x199},\
{0x140004fc808, ...}})
/usr/local/sfomuseum/go-activitypub/vendor/gocloud.dev/docstore/awsdynamodb/query.go:492 +0x260
The Query
itself has two "Where" statements bracketing a date range:
https://github.com/sfomuseum/go-activitypub/blob/main/accounts_database_docstore.go#L45-L69
Curiously, this problem only seems to affect one table (accounts). The same code (essentially) run against a different table (deliveries) does not panic:
https://github.com/sfomuseum/go-activitypub/blob/main/deliveries_database_docstore.go#L46-L58
To Reproduce
The code in question is here:
https://github.com/sfomuseum/go-activitypub/blob/main/cmd/counts-for-date/main.go
Specifically:
https://github.com/sfomuseum/go-activitypub/blob/main/stats/counts.go#L44
Which is run like this:
go run cmd/counts-for-date/main.go \
-accounts-database-uri 'awsdynamodb://accounts?partition_key=Id&allow_scans=true®ion={REGION}&credentials={CREDENTIALS}' \
-blocks-database-uri 'awsdynamodb://blocks?partition_key=Id&allow_scans=true®ion={REGION}&credentials={CREDENTIALS}' \
-boosts-database-uri 'awsdynamodb://boosts?partition_key=Id&allow_scans=true®ion={REGION}&credentials={CREDENTIALS}' \
-deliveries-database-uri 'awsdynamodb://deliveries?partition_key=Id&allow_scans=true®ion={REGION}&credentials={CREDENTIALS}' \
-followers-database-uri 'awsdynamodb://followers?partition_key=Id&allow_scans=true®ion={REGION}&credentials={CREDENTIALS}' \
-following-database-uri 'awsdynamodb://following?partition_key=Id&allow_scans=true®ion={REGION}&credentials={CREDENTIALS}' \
-likes-database-uri 'awsdynamodb://likes?partition_key=Id&allow_scans=true®ion={REGION}&credentials={CREDENTIALS}' \
-messages-database-uri 'awsdynamodb://messages?partition_key=Id&allow_scans=true®ion={REGION}&credentials={CREDENTIALS}' \
-notes-database-uri 'awsdynamodb://notes?partition_key=Id&allow_scans=true®ion={REGION}&credentials={CREDENTIALS}' \
-posts-database-uri 'awsdynamodb://ap_posts?partition_key=Id&allow_scans=true®ion={REGION}&credentials={CREDENTIALS}' \
-date {YYYY-MM-DD} \
-verbose
If you need/want to set up all those tables, the code to do so is here:
https://github.com/sfomuseum/go-activitypub/blob/main/cmd/create-dynamodb-tables/main.go
And:
https://github.com/sfomuseum/go-activitypub/tree/main/app/dynamodb/tables/create
For example:
go run -mod vendor cmd/create-dynamodb-tables/main.go \
-dynamodb-client-uri 'awsdynamodb://?region={REGION}&credentials={CREDENTIALS}'
Note that all of these URIs are using a custom awsdynamodb://
scheme. Under the hood all of this code is using the aaronland/go-aws-dynamodb
and aaronland/go-aws-session
packages in to support a wider range of declarative credentialing options than those offered by the default AWS libraries:
- https://github.com/aaronland/go-aws-dynamodb
- https://github.com/aaronland/go-aws-session
- https://github.com/aaronland/go-aws-session?tab=readme-ov-file#credentials
There are also Makefile
targets for running everything locally against a containerizer DynamoDB instance:
- https://github.com/sfomuseum/go-activitypub/tree/main?tab=readme-ov-file#example
Expected behavior
An iterator whose Next
method triggers an io.EOF
error.
Version
0.37.0
Additional context
Documentation (albeit incomplete) for the application's database design and specifics about the use of gocloud.dev/docstore
are here:
- https://github.com/sfomuseum/go-activitypub?tab=readme-ov-file#databases
- https://github.com/sfomuseum/go-activitypub?tab=readme-ov-file#goclouddevdocstore
The really long think-y think-y version is here:
- https://millsfield.sfomuseum.org/blog/2024/03/12/activitypub/
This is hard to debug from here, can you use gohack and add some Printfs in the relevant code to see what's happening?
In here: https://github.com/google/go-cloud/blob/master/docstore/awsdynamodb/query.go#L472
Specifically it would be good to know what it.offset
, it.limit
, it.curr
, it.last
, it.count
, it.limit
, and len(it.items)
is when this happens.
I think that this line: https://github.com/google/go-cloud/blob/master/docstore/awsdynamodb/query.go#L480 is supposed to handle this, but it's not.
It's possible that this is a regression in https://github.com/google/go-cloud/pull/3385.
@bartventer FYI
I've been digging into this issue and although I was unable to reproduce the panic, I added a preventive check for awsdynamodb
here. This check returns io.EOF
if there are no more items, which should prevent an index out of range
panic when Next
is called after it has already returned EOF
once.
I also added a test case to verify this behavior. The test repeatedly calls Next
until it gets io.EOF
, then calls Next
again to ensure it still returns io.EOF
.
I've pushed these changes in a PR. Please have a look when you get a chance and let me know if there's anything that needs adjusting.
Thanks for the fix. I can confirm that I am no longer seeing panics calling iter.Next
–
https://github.com/sfomuseum/go-activitypub/blob/iter-fix/go.mod#L5
Question: In the process of investigating the panics I often noticed that the docstore.Query
iterators would return/report no results. For example if I call a simple shell script wrapper around the "counts" code like this:
> sh ./work/counts.sh 2024-03-18
{"time":"2024-03-21T16:16:56.081668-07:00","level":"DEBUG","msg":"Verbose logging enabled"}
{"time":"2024-03-21T16:16:57.021555-07:00","level":"DEBUG","msg":"Get counts for date","date":"2024-03-18","location":"America/Los_Angeles","start":1710720000,"end":1710806400}
{"date":"2024-03-18","location":"America/Los_Angeles","accounts":0,"blocks":0,"boosts":0,"deliveries":0,"followers":0,"following":0,"likes":0,"messages":0,"notes":0,"posts":0}
All the query iterators return nothing. But then if I call the same code immediate afterwards the queries return results:
> sh ./work/counts.sh 2024-03-18
{"time":"2024-03-21T16:17:03.270628-07:00","level":"DEBUG","msg":"Verbose logging enabled"}
{"time":"2024-03-21T16:17:04.100218-07:00","level":"DEBUG","msg":"Get counts for date","date":"2024-03-18","location":"America/Los_Angeles","start":1710720000,"end":1710806400}
{"date":"2024-03-18","location":"America/Los_Angeles","accounts":0,"blocks":0,"boosts":0,"deliveries":9045,"followers":1,"following":0,"likes":0,"messages":0,"notes":0,"posts":13}
The code in question is:
- https://github.com/sfomuseum/go-activitypub/blob/main/stats/counts.go
- https://github.com/sfomuseum/go-activitypub/blob/main/deliveries_database_docstore.go#L44
I am uncertain whether this is related to the iter.Next
problem or something entirely different. If it's more helpful to create a separate issue I am happy to do that.
Are you using Offset
at all?
and/or Limit
?
@bartventer I think there's a bug in Offset
that may be causing both of these issues.
Example:
- Let's say
Offset
is 10. - We do the initial
it.qr.run
inRunGetQuery
, let's say it returns 5 items. - When
Next
is called, it will recurse 10 times to move past the offset. - On the next
Next
call, we'll do a newit.qr.run
("Make a new query request at the end of this page"), and then setcurr = 0
, but we may have just gotten items 6 through 8 (for example).
I am not currently using either Offet
or Limit
.
Hrm that damages my theory, but I still think there's a bug in those.
@thisisaaronland it would be really helpful if you could provide some debugging info as requested above, as it seems like you can reproduce this easily and we can't. Maybe even comment out the new lines that fixed the panic so we can understand what's going on there too.
Sure thing. I may not be able to do that until tomorrow but I will get to it as soon as I can. Thanks for all the help so far.
Per AWS, "A Query operation can return an empty result set and a LastEvaluatedKey if all the items read for the page of results are filtered out."
I think we're not handling that correctly, which causes the empty results sometimes, as well as the panic.
@bartventer can you take a look at https://github.com/google/go-cloud/pull/3408 ?
@thisisaaronland can you try it out and see if that looks better?
I couldn't figure out what the replace
directive for the changes in #3408 should be so I just brute-force cloned the query.go
file in my vendor
folder. With only limited testing it doesn't seem to have helped and may, in fact, have made things worse. Specifically:
- Running the code (mentioned above) for the first time the
Query
iterator continues to return no results. - Running the code (mentioned above) a second time simply hung and never returned any results.
I will carve out some time in the morning to add the debugging hooks you suggested (above) to try and get more concrete details to work with.
If it's at all helpful what I have observed is that the Query
iterator seems to return no results the first time it is invoked but then subsequently it starts to return results, even though the queries themselves are happening in entirely new processes. So maybe there is something happening at the AWS/Dynamo layer where the iterator returns prematurely on the first invocation and that, by extension, Dynamo queries (in AWS) exist in some kind of "window of time" between invocations ?
Can you try again, I made some changes. If it doesn't work let me know, printfs...
On Thu, Mar 21, 2024, 6:16 PM Aaron Straup Cope @.***> wrote:
I couldn't figure out what the replace directive for the changes in #3408 https://github.com/google/go-cloud/pull/3408 should be so I just brute-force cloned the query.go file in my vendor folder. With only limited testing it doesn't seem to have helped and may, in fact, have made things worse. Specifically:
- Running the code (mentioned above) for the first time the Query iterator continues to return no results.
- Running the code (mentioned above) a second time simply hung and never returned any results.
I will carve out some time in the morning to add the debugging hooks you suggested (above) to try and get more concrete details to work with.
If it's at all helpful what I have observed is that the Query iterator seems to return no results the first time it is invoked but then subsequently it starts to return results, even though the queries themselves are happening in entirely new processes. So maybe there is something happening at the AWS/Dynamo layer where the iterator returns prematurely on the first invocation and that, by extension, Dynamo queries exist in some kind of "window of time" between invocations ?
— Reply to this email directly, view it on GitHub https://github.com/google/go-cloud/issues/3405#issuecomment-2014139958, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMCKTSPUUQCJOERCSE74TLYZOA55AVCNFSM6AAAAABE4FMXTGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUGEZTSOJVHA . You are receiving this because you modified the open/close state.Message ID: @.***>
Per AWS, "A Query operation can return an empty result set and a LastEvaluatedKey if all the items read for the page of results are filtered out."
I think we're not handling that correctly, which causes the empty results sometimes, as well as the panic.
@bartventer can you take a look at #3408 ?
@thisisaaronland can you try it out and see if that looks better?
I've had a look at PR #3408 and I agree with the logic there. I've created a new PR (#3409) that builds on that work. @vangent, could you take a look when you get a chance?
Just a quick note that I applied github.com/bartventer/go-cloud v0.0.0-20240322082227-9d139c92796d
and I am not seeing time-outs but I am still seeing inconsistent behaviour:
https://github.com/sfomuseum/go-activitypub/blob/iter-fix/go.mod#L7
Specifically: The first time the query is run it will sometimes return 0 results but then if the query is run again inside a short window of time (but in an entirely new process) the query will return results. That's all I've been able to do so far but will try to carve out time to get more concrete debugging information soon.
Ping @thisisaaronland
Any updates here @thisisaaronland ?