agi icon indicating copy to clipboard operation
agi copied to clipboard

Samsung devices w/ Qualcomm GPU fails validation (needs to be rooted to access GPU counters)

Open EricAtPlanGrid opened this issue 2 years ago • 29 comments

Environment information:

  • AGI version: Version 2022-3.0.1
  • Host OS: [e.g. Linux] macOS If tracing on Android:
  • Device model: Samsung S21 SM-G991U1 and SM-G996U1
  • Android Version: Android 12 (One UI 4.1, Build SP1A.210812.016.G991U1UEU5CVDC)

Bug description

Device validation fails: Check failed for counter: {1 Clocks / Second 0x46e6a20}

Reproduction steps Steps to reproduce the behavior: Attempt to use AGI w/ these devices

Additional debugging information

  • Please attach the generated gapis.log and gapic.log files you will find in the temp folder (e.g. /tmp/ on linux).
  • If using Android: Please attach a full logcat dump (adb logcat -d > logcat-full.txt) that contains logs since AGI was started. gapis.log gapic.log logcat-full.txt

EricAtPlanGrid avatar May 23 '22 21:05 EricAtPlanGrid

Hello, thanks for filing this issue!

This part of the validation process can be non-deterministic. Please retry while making sure that, the screen remains unlocked and the sample application remains in focus. If this continues to persists, please upload the trace file (from the 'View trace file' link)

Screenshot of validation window with 'View trace file'

ttanatb avatar May 25 '22 12:05 ttanatb

Hi @ttanatb, I can consistently repro this as well (S21+) with the screen unlocked and the sample app visible until the validation fails. Here's my trace:

validation4229315264.perfetto.zip

alusch avatar May 25 '22 15:05 alusch

Looking through the logcat it looks like there's an issue with getting counter values from the driver as everything is set to 0 in the perfetto trace.

Are you by chance using a custom build or a preview build of the OS? Or was the validation working correctly earlier and only recently started breaking?

ttanatb avatar May 26 '22 09:05 ttanatb

Thanks @ttanatb. It's just the normal official OS build 🤔 I also tried AGI 2.1.0, which failed, too. That definitely worked a month or two ago on this device, so I suppose the only thing that could have changed is an OS update, the last of which I installed on May 17. Is there a way to get in touch with Samsung to see if something changed with the driver?

alusch avatar May 26 '22 15:05 alusch

It looks like there was a breaking change in the driver recently (not sure if it was included in the May 17 update or the one before that). Unfortunately, reverting the AGI version will not fix this issue, and I don't have an easy workaround for this at the moment. We're working on getting this sorted out and will update once we have something.

ttanatb avatar May 27 '22 18:05 ttanatb

Apologies for the delay, here's a workaround to get around issues with getting GPU counters on Adreno GPUs:

adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/perfcounter"

It's worth noting that you may want to set the contents of /sys/class/kgsl/kgsl-3d0/perfcounter back to 0 as a precaution after profiling, so that other apps/processes cannot access GPU counters.

I'm working on a change that runs these commands before profiling, will notify once it's in the new release.

ttanatb avatar Jun 24 '22 17:06 ttanatb

I have also the same problem on a Galaxy Tab S8 (SM-X700, Android 12, OneUI 4.1, build SP1A.210812.016.X700XXU2AVF5). I can't apply the workaround since the device is not rooted (and I don't want to) and /sys/class/kgsl/kgsl-3d0/perfcounter is only writable by root. So I guess I'll have to wait for driver update...

Mikayex avatar Jul 13 '22 09:07 Mikayex

I see, I've verified that /sys/class/kgsl/kgsl-3d0/perfcounter does not require root access on pixel devices, will try to contact folks on Samsung to try to resolve this, but cannot comment on the timeline or feasibility of this. Apologies for the breakage and lack of convenient workaround, but we're working to ensure that we don't end up with a breaking change like this in the future.

ttanatb avatar Jul 20 '22 14:07 ttanatb

Adding in here, ASUS devices (ROG 6 and Zenfone 9) are also affected.

JakeDaynes avatar Aug 26 '22 19:08 JakeDaynes

Hi @JakeDaynes have you tried this with the latest dev release https://github.com/google/agi-dev-releases/releases/tag/v3.1.0-dev-20220830 and just to confirm, are you able to run adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/perfcounter" or is the access to the file gated by root (instead of shell)?

If the file access is set to root, we'll have to ask ASUS to release a patch to fix this

ttanatb avatar Aug 30 '22 15:08 ttanatb

@ttanatb Sorry for any confusion - I was more commenting on the fact that the ASUS devices are affected by the kgsl-3d0/perfcounter file being gated by root. I've gotten in contact with ASUS over the issue, but having Google reach out and push for a patch would help.

JakeDaynes avatar Aug 30 '22 16:08 JakeDaynes

Thanks for the clarification -- just wanted to make sure because I couldn't get a hold of an ASUS device to repro. I'm working with the rest of the team to push for a resolution on this

The team is also working on efforts to prevent this in the future by tightening our test suites to ensure that a patch will not break our product like this in the future. Apologies for the inconvenience

ttanatb avatar Aug 30 '22 16:08 ttanatb

@JakeDaynes Could you provide your trace file (the *.perfetto file that is linked in the trace dialog) for me to double-check as well? Would like to avoid case in which it happens to be something else -- Thanks!

ttanatb avatar Aug 30 '22 16:08 ttanatb

All good - kgsl drivers/counters have been a nightmare since A7, so wasn't really surprised :)

I'm actually not using AGI (I've got my own IOCTL communication to the driver), but I've confirmed that Snapdragon Profiler is also blocked by this root gate issue as well

JakeDaynes avatar Aug 30 '22 16:08 JakeDaynes

Just got an update on my Samsung tablet (SM-X700, Android 12, OneUI 4.1.1, build SP2A.220305.013.X700XXU2AVH4) and /sys/class/kgsl/kgsl-3d0/perfcounter is now writable by shell group! Device validation now passes after enabling counters :smiley:

Mikayex avatar Sep 02 '22 08:09 Mikayex

Apologies for the delay, here's a workaround to get around issues with getting GPU counters on Adreno GPUs:

adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/perfcounter"

It's worth noting that you may want to set the contents of /sys/class/kgsl/kgsl-3d0/perfcounter back to 0 as a precaution after profiling, so that other apps/processes cannot access GPU counters.

I'm working on a change that runs these commands before profiling, will notify once it's in the new release.

Working like a charm. My Note S20 (Qualcomm Adreno (TM) 650, Android 12) did not pass the validation test before using this command.

But another error comes out after this. (Image below) image

Can this be related to this problem?

Cliwo avatar Oct 04 '22 08:10 Cliwo

@Cliwo that error looks like your adb connection might've been severed (i.e. faulty cable or locked phone screen), if the issue continues to persist (i.e. not a flakey issue) please file a separate issue.

I'm closing this for now because the update seems to have landed and we can finally look at wiggly little GPU counter numbers again 🥳 Future versions of AGI will run the adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/perfcounter" command for you and I'm working to hopefully make sure that future versions of Android will not have this same issue again.

ttanatb avatar Oct 04 '22 14:10 ttanatb

Same problem with S21 FE 5G (SM-G990B2)

adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/perfcounter" fails with /system/bin/sh: can't create /sys/class/kgsl/kgsl-3d0/perfcounter: Permission denied

CPU: Qualcomm Snapdragon 888 5G GPU: Adreno 660 Android: 13 Build: TP1A.220624.014.G990B2XXS1DVL1 UI: OneUI 5.0

Host OS: Windows 10 Enterprise 21H2 AGI version: 2022-3.2.1:d808a5ebcdca788a1bd50d105cc21f9888074d4d

Roman-Skabin avatar Jan 24 '23 11:01 Roman-Skabin

Apologize for the delay in getting this resolved, I've communicated the issue with folks at Samsung and am hoping to get this resolved as soon as possible.

In the future, we've added tests in Android 14 to ensure that we don't get this kind of regression again.

ttanatb avatar Jan 24 '23 12:01 ttanatb

Honestly a standardised interface with an available counter map would be a godsend in future versions of the OS. One of Android's basic tenants is transparency, yet over the last 10 major releases I've been doing performance profiling development, we've seen access to certain information killed or modified heavily time and time again.

Really appreciate you working towards keeping this open at least.

JakeDaynes avatar Jan 24 '23 17:01 JakeDaynes

Thank you soo much. Is there any way to follow along with Samsung's side of this bug?

phx-ddevoe avatar Feb 14 '23 03:02 phx-ddevoe

I'm not a certain of a way to track this issue publicly, but we've communicated this problem to them via internal channels previously. I can try to ping this issue again to try to understand the timeline on getting this fixed.

ttanatb avatar Feb 14 '23 14:02 ttanatb

I don't want to spam this thread (as I spam this thread) but I do want to add my voice here to say that many of us would benefit from this being resolved! Thank you for your hard work @ttanatb et al!

EricAtPlanGrid avatar Mar 06 '23 20:03 EricAtPlanGrid

Pinged the folks at Samsung -- will update this thread once I receive a response

ttanatb avatar Mar 07 '23 19:03 ttanatb

and still, they didn't fix this!

chakibchemso avatar Jul 18 '23 23:07 chakibchemso

Dare I ask, has there been any further contact with Samsung?

EricAtPlanGrid avatar Aug 29 '23 17:08 EricAtPlanGrid

Any updates here @ttanatb ?

joshuawilde avatar Sep 19 '23 19:09 joshuawilde

Any updates?

ppamorim avatar Apr 09 '24 19:04 ppamorim