azure-sdk-for-go icon indicating copy to clipboard operation
azure-sdk-for-go copied to clipboard

Getting the next page of the activity logs fails with invalid semicolon separator in query

Open alxndr13 opened this issue 1 year ago • 25 comments

Bug Report

  • import path of package in question: github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/monitor/armmonitor
  • SDK version: latest (github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/monitor/armmonitor v0.11.0)
  • output of go version:
go version go1.22.0 linux/amd64
  • What happened?

When fetching the activity logs in some subscriptions, the code errors out with the following message:

invalid semicolon separator in query

Our code (minimized) looks like this:

        query := fmt.Sprintf("eventTimestamp ge '%s' and eventTimestamp le '%s'", startTime.Format(time.RFC3339Nano), endTime.Format(time.RFC3339Nano))
	pager := clientFactory.NewActivityLogsClient().NewListPager(query, nil)
	for pager.More() {
		page, err := pager.NextPage(ctx)
		if err != nil {
			return nil, fmt.Errorf("failed to advance page: %w", err) << this is where it fails
		}
       }
  • What did you expect or want to happen?

Get the logs, as in other subs.

  • How can we reproduce it?

idk, check if you can fetch activity logs on all subs.

Additional info

a screenshot from debugging:

image

A similar error happens when using a semicolon in a query, see: https://github.com/golang/go/issues/50034

my guess is: the content of the *NextLink can't be parsed in some / all cases.

alxndr13 avatar Feb 20 '24 15:02 alxndr13

I suspect you're correct. Would you mind enabling logging and paste the value of NextLink?

jhendrixMSFT avatar Feb 20 '24 16:02 jhendrixMSFT

i got the link from debugging with delve:

https://management.azure.com/subscriptions/$$$subscriptionID$$$/providers/Microsoft.Insights/eventtypes/management/values?%24filter=eventTimestamp+ge+%272024-02-19T04%3A05%3A37Z%27+and+eventTimestamp+le+%272024-02-20T16%3A05%3A37Z%27&api-version=2015-04-01&$skipToken=hoboshim~sbs~querymodeint9;diffreport1;queryutcnow:0638440419441225639;workspaceids:c4e1bdb5-8f3d-44a4-8a25-56924af57454~96dfd74e-3fe7-41ec-a72e-4e1b0d4ad6e7;sessionidf53b1a47-d12e-488d-885b-12c2a395f835;page1;hobo:0638440358902794048~0638440418436268230~0638439151240566239~1~0638440354570000000~1296~94~133~234~0~62~85~2~0~0~20~74~52~95~14~28~54~58~50~30~0~0~2~0~0~2~0~0~2~0~0~2~0~2~2~0~0~2~0~0~2~0~0~2~0~0~2~0~0~4~0~0~0~2~0~0~0~6~0~0~2~0~0~3~0~0~4~42~0~2~2~30~219~163~18~256~0~2~0~14~2~0~0~127~78~139~7~8~0~12~0~0~2~0~0~2~sbs~laqs~sbs~

it seems like a new issue arose yesterday: the data model changed yesterday at around 10 pm CET.

Instead of the semicolon error, the lib now can't unmarshal the data in the response.

unmarshalling type *armmonitor.EventDataCollection: unmarshalling type *armmonitor.EventDataCollection: struct field Value: unmarshalling type *armmonitor.EventData: struct field Level: json: cannot unmarshal number into Go value of type armmonitor.EventLevel

Will update this comment, once I found out more.

UPDATE:

In constants.go the EventLevel is defined as a string and has some consts defined:

// EventLevel - the event level
type EventLevel string

const (
	EventLevelCritical      EventLevel = "Critical"
	EventLevelError         EventLevel = "Error"
	EventLevelInformational EventLevel = "Informational"
	EventLevelVerbose       EventLevel = "Verbose"
	EventLevelWarning       EventLevel = "Warning"
)

seems like the API returns a number instead of one of the expected strings.

alxndr13 avatar Feb 21 '24 09:02 alxndr13

+1 I am also getting the same error as of yesterday.

unmarshalling type *armmonitor.EventDataCollection: unmarshalling type *armmonitor.EventDataCollection: struct field Value: unmarshalling type *armmonitor.EventData: struct field Level: json: cannot unmarshal number into Go value of type armmonitor.EventLevel

Edit: Seems like a high severity breakage. I have production applications that rely on being able to consume and process monitor events for customers.

CyrusJavan avatar Feb 21 '24 18:02 CyrusJavan

I found the same issue. When will it be fixed?

jeun-kim avatar Feb 22 '24 06:02 jeun-kim

The response is "4" and cannot be unmarshal to EventLevel image

Alancere avatar Feb 23 '24 06:02 Alancere

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @ArcturusZhang @AzmonActionG @AzmonAlerts @AzMonEssential @AzmonLogA @chlowell @dadunl @gracewilcox @jairmyree @jhendrixMSFT @joshfree @KarishmaGhiya @kurtzeborn @lirenhe @nisha-bhatia @pvaneck @SameergMS @sarangan12 @scottaddie @srnagar @tadelesh.

github-actions[bot] avatar Feb 23 '24 06:02 github-actions[bot]

any updates on this one?

we also have apps that rely on this functionality. this is production-critical for us.

alxndr13 avatar Feb 25 '24 13:02 alxndr13

@alxndr13 , if this issue has arisen recently, could you please create a support ticket in the Azure portal? This will help you receive prompt assistance.

raych1 avatar Feb 26 '24 08:02 raych1

@alxndr13 , if this issue has arisen recently, could you please create a support ticket in the Azure portal? This will help you receive prompt assistance.

Why should I open up a ticket in the Azure Portal for this? This is something that can either be fixed in the API or this SDK. (which this repo is for)

alxndr13 avatar Feb 26 '24 08:02 alxndr13

i fixed the issue with the eventlevel in my azure-sdk-for-go fork here: https://github.com/alxndr13/azure-sdk-for-go

no guarantee that I properly matched the levels to the corresponding integers. I use this now until this is fixed here.

can be used in your go program using this entry in the go.mod file:

replace github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/monitor/armmonitor => github.com/alxndr13/azure-sdk-for-go/sdk/resourcemanager/monitor/armmonitor v0.0.0-20240226093305-b58ccfd438c1

@jhendrixMSFT with my fork now the semicolon error doesn't occur anymore, at least i can't reproduce it at the moment. Did you guys change anything?

EDIT: nevermind, issue still persists.

alxndr13 avatar Feb 26 '24 09:02 alxndr13

The service shouldn't be returning EventLevel by its ordinal value. We're following up on this.

For the semicolons in the query params, we'll have this fixed in [email protected] which will be released soon.

jhendrixMSFT avatar Feb 27 '24 17:02 jhendrixMSFT

Keeping open while we follow up on the unmarshaling issue.

jhendrixMSFT avatar Feb 27 '24 18:02 jhendrixMSFT

Hi, Thank you for letting us know. I will check the issue and update you as soon as possible.

osalzberg avatar Feb 28 '24 06:02 osalzberg

a fix should be deployed in the next 1-2 days

osalzberg avatar Feb 28 '24 12:02 osalzberg

a fix should be deployed in the next 1-2 days

thanks for the update.

alxndr13 avatar Feb 28 '24 12:02 alxndr13

Curious to know if the fix is going to be on the server end. Or should the client end be using the upgraded version once the fix is available?

Note: We are using github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/monitor/armmonitor v0.8.0, and this issue started surfacing since Feb 21st on our end.

The fix is deployed world-wide, please let us know if there are any issues.

This issue can be marked as resolved.

osalzberg avatar Feb 29 '24 13:02 osalzberg

@osalzberg Just got the error again at 2024-02-29T18:19:45 UTC

2024-02-29T18:19:45.265670919Z	error	azure/azure_resource_event.go:111	error retrieving events	{"failed_time": "2024-02-29T18:19:45Z", "error": "unmarshalling type *armmonitor.EventDataCollection: unmarshalling type *armmonitor.EventData: struct field Level: json: cannot unmarshal number into Go value of type armmonitor.EventLevel"}

Is there some propagation delay in deploying the fix?

Update:

Still receiving the same error as of 2024-03-01T20:12:11 UTC.

CyrusJavan avatar Feb 29 '24 18:02 CyrusJavan

[email protected] has been released which includes the fix for semicolons in query params.

jhendrixMSFT avatar Feb 29 '24 23:02 jhendrixMSFT

@osalzberg The issue still persists. When will this be resolved?

CyrusJavan avatar Mar 04 '24 22:03 CyrusJavan

what region are you in? can you share some more details? the fix was deployed world wide. please send me an email and ill try to assist. [email protected]

osalzberg avatar Mar 05 '24 01:03 osalzberg

@osalzberg @jhendrixMSFT can confirm it works again.

Thanks a lot.

alxndr13 avatar Mar 05 '24 08:03 alxndr13

Glad to hear it works. if there are any issues, please ping me or open a ticket.

osalzberg avatar Mar 05 '24 14:03 osalzberg

I can confirm that there was an issue with "unselected" columns. While we are working on a fix, you could remedy the situation by explicitly specifying "level" as a selected column in your "selectColumns" variable. (you might get a similar error for the "channels" field, you can also resolve it the same way)

osalzberg avatar Mar 06 '24 11:03 osalzberg

I can confirm that there was an issue with "unselected" columns. While we are working on a fix, you could remedy the situation by explicitly specifying "level" as a selected column in your "selectColumns" variable.

Can confirm this workaround works successfully in my use case. Thanks a lot @osalzberg !

CyrusJavan avatar Mar 11 '24 15:03 CyrusJavan

@osalzberg have the issues with unselected and channels been resolved?

jhendrixMSFT avatar Apr 08 '24 17:04 jhendrixMSFT

yes. are you experiencing any issues? if so, please open a ticket and let the CSS know you were talking to me (product group)

osalzberg avatar Apr 09 '24 05:04 osalzberg

No issues. Closing this then as things should be resolved.

jhendrixMSFT avatar Apr 09 '24 13:04 jhendrixMSFT