dnceng icon indicating copy to clipboard operation
dnceng copied to clipboard

CoreSimulator folder permissions issue on some Mac machines

Open ivanpovazan opened this issue 1 year ago • 23 comments

Build

https://dev.azure.com/dnceng-public/public/_build/results?buildId=822026

Build leg reported

E2E Apple - Simulator Commands Helix Tests Build_Debug

Pull Request

https://github.com/dotnet/xharness/pull/1282

Known issue core information

Fill out the known issue JSON section by following the step by step documentation on how to create a known issue

{
  "ErrorPattern": "permission to save the file.*in the folder.*CoreSimulator",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

@dotnet/dnceng

Release Note Category

  • [ ] Feature changes/additions
  • [ ] Bug fixes
  • [ ] Internal Infrastructure Improvements

Release Note Description

Additional information about the issue reported

Build error leg or test failing:

  • Failed job: E2E Apple - Simulator Commands Helix Tests Build_Debug
  • Failed test: System.Numerics.Vectors.Tests

When testing simulator commands on some machines we get:

[08:21:48] dbug: Running /Applications/Xcode_14.3.app/Contents/Developer/usr/bin/simctl boot 4B1F47EA-7853-45DF-B459-37A3E6BCABF8
[08:21:49] dbug: An error was encountered processing the command (domain=NSCocoaErrorDomain, code=513):
[08:21:49] dbug: You don’t have permission to save the file “4B1F47EA-7853-45DF-B459-37A3E6BCABF8” in the folder “CoreSimulator”.
[08:21:49] dbug: You don’t have permission.
[08:21:49] dbug: To view or change permissions, select the item in the Finder and choose File > Get Info.
[08:21:49] dbug: Underlying error (domain=NSPOSIXErrorDomain, code=13):
[08:21:49] dbug: The operation couldn’t be completed. Permission denied
[08:21:49] dbug: Permission denied
[08:21:49] dbug: Process simctl exited with 1

Full log for context as it repeats the permission denied throughout the sequence: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-xharness-refs-pull-1282-merge-6620621afd364a37b1/System.Numerics.Vectors.Tests.Attempt.3/1/console.fd153acd.log?helixlogtype=result

Additional info

This was occurring before as reported in: https://github.com/dotnet/dnceng/issues/1878

Known issue validation

Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=822026 Error message validated: [permission to save the file.*in the folder.*CoreSimulator] Result validation: :white_check_mark: Known issue matched with the provided build. Validation performed at: 10/1/2024 2:44:39 PM UTC

Report

Build Definition Test Pull Request
1041775 dotnet/xharness System.Numerics.Vectors.Tests.WorkItemExecution dotnet/xharness#1410
1039341 dotnet/xharness System.Numerics.Vectors.Tests.WorkItemExecution
1028095 dotnet/xharness System.Numerics.Vectors.Tests.WorkItemExecution

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 2 3

ivanpovazan avatar Oct 01 '24 14:10 ivanpovazan

This is happening on:

  • dci-mac-build-250
  • dci-mac-build-240
  • dci-mac-build-225
  • dci-mac-build-221
  • dci-mac-build-220
  • dci-mac-build-234
  • dci-mac-build-224
  • dci-mac-build-223
  • dci-macm2-build-033
  • dci-macm2-build-038
  • dci-macm2-build-032

This is similar to https://github.com/dotnet/dnceng/issues/1878

@dotnet/dnceng can someone please log onto the machines from above and check/set the right folder permissions?

ivanpovazan avatar Oct 01 '24 14:10 ivanpovazan

full path should be /Library/Developer/PrivateFrameworks/CoreSimulator.framework/Versions/A/CoreSimulator

dougbu avatar Feb 13 '25 16:02 dougbu

@premun could you take a look at this and see if there is anything else that we need to do here? I know we have a work item (#5288) but that is for future deployments and should not affect existing systems

ilyas1974 avatar Apr 16 '25 14:04 ilyas1974

Is it a safe operation to do chmod -R on this folder?

@ivanpovazan FYI there is a full passwordless sudo available so this is possible to be done via a Helix job - no manual logon needed. We can do it quite quickly but I don't know if it won't break Xcode?

I understand you've done something similar as part of https://github.com/dotnet/dnceng/issues/5024? Do you know what exactly was done?

premun avatar Apr 17 '25 08:04 premun

It would also be good to understand how the permissions went wrong in the first place. Is it just wrong user?

premun avatar Apr 17 '25 08:04 premun

I am seeing this issue on one of our internal pipelines which use helix. I'm not sure if I should be using sudo to run the various xcrun simctl commands we have to start the simulator? Is that the best work around for now?

dellis1972 avatar Apr 17 '25 10:04 dellis1972

We never really run these commands manually but instead use XHarness. Did Xamarin/MAUI start using Helix recently?

I am curious if that would be the source of the recent problems - maybe things like running the provisionator or similar on the on-prem machines.

I am asking because I know MAUI have a much different approach to running iOS/Android workloads to Helix.

premun avatar Apr 17 '25 10:04 premun

We can't use xharness. This is not for maui or xamarin. I'm adding mobile tests for the debugger. So I have to call those commands manually.

dellis1972 avatar Apr 17 '25 10:04 dellis1972

Ah, okay. Running xcrun simctl is fine, I was just doublechecking that things like provisionator are not run. Only because I know some repo's workloads are targeting VMs mostly and always prepare the Xcode and all dependencies from ground up while Helix usually treats the on-prem machines more carefully as they are not VMs.

premun avatar Apr 17 '25 10:04 premun

I see the previous IcM ticket and I will reset the owner of these directories to helix-runner

premun avatar Apr 17 '25 10:04 premun

Ah, okay. Running xcrun simctl is fine, I was just doublechecking that things like provisionator are not run. Only because I know some repo's workloads are targeting VMs mostly and always prepare the Xcode and all dependencies from ground up while Helix usually treats the on-prem machines more carefully as they are not VMs.

The only thing I do is install the android and ios workloads into a local copy of dotnet. For the simulators and emulators i use what ever is already installed 😄

dellis1972 avatar Apr 17 '25 10:04 dellis1972

Okay, all 40 machines in the osx.13.amd64.open queue have their owner changed with this:

sudo chown -R helix-runner /Library/Developer/PrivateFrameworks/CoreSimulator.framework

premun avatar Apr 17 '25 10:04 premun

Okay, all 40 machines in the osx.13.amd64.open queue have their owner changed with this:

sudo chown -R helix-runner /Library/Developer/PrivateFrameworks/CoreSimulator.framework

I'll run up the tests and let you know how it goes 😄

dellis1972 avatar Apr 17 '25 10:04 dellis1972

Okay, all 40 machines in the osx.13.amd64.open queue have their owner changed with this:

Ah, looks like we run on the osx.13.arm64 queue.

dellis1972 avatar Apr 17 '25 11:04 dellis1972

@dellis1972 you see the same problem there?

premun avatar Apr 17 '25 15:04 premun

I ran the permission update for the arm queue as well

premun avatar Apr 17 '25 16:04 premun

I don't think the bots the project I'm working on are using are on the open pool. I'll check.

dellis1972 avatar Apr 17 '25 17:04 dellis1972

Looks like the simulator issue has been resolved 👍

dellis1972 avatar Apr 17 '25 18:04 dellis1972

Sorry for the late reply. Regarding:

@ivanpovazan FYI there is a full passwordless sudo available so this is possible to be done via a Helix job - no manual logon needed. We can do it quite quickly but I don't know if it won't break Xcode?

I understand you've done something similar as part of #5024? Do you know what exactly was done?

We did not fix it per se, we just migrated to newer queues i.e., osx.15.amd64.open osx.14.arm64.open

As for the xharness commands, we were hitting this issue when we run simulator commands tests: https://github.com/dotnet/xharness/blob/026a0d0e53147434f76eb9f420c39234261251a1/tests/integration-tests/Apple/Simulator.Commands.Tests.proj#L35-L45 which are testing a full simulator reset feature xharness apple simulators reset-simulator , resulting with CoreSimulator folder permissions: https://helixr1107v0xdeko0k025g8.blob.core.windows.net/dotnet-xharness-refs-pull-1400-merge-04d05ce0bd394a58a0/System.Numerics.Vectors.Tests.Attempt.3/1/console.2815ff97.log?helixlogtype=result

We will keep an eye if this still occurs on the newer queues and provide feedback. Thanks for looking into it.

ivanpovazan avatar Apr 22 '25 11:04 ivanpovazan

@premun

I'm still seeing this issue on osx.13.arm64 machines.

            00:53.115: 2025-04-24 12:35:14.034 simctl[20934:341193] Error opening log file (/Users/xxxx/Library/Logs/CoreSimulator/CoreSimulator.com.apple.CoreSimulator.simctl.log): Permission denied
            00:53.115: An error was encountered processing the command (domain=NSCocoaErrorDomain, code=513):
            00:53.115: You don’t have permission to save the file “012C3ED0-0BE0-4C66-BDD8-09A89D69E53A” in the folder “CoreSimulator”.
            00:53.115: You don’t have permission.
            00:53.115: To view or change permissions, select the item in the Finder and choose File > Get Info.
            00:53.115: Underlying error (domain=NSPOSIXErrorDomain, code=13):

dellis1972 avatar Apr 24 '25 17:04 dellis1972

As @ivanpovazan writes above, it might be getting broken down by the simulator reset. If that's the case, it will keep coming back until we fix the root cause. I won't be able to work on this at this time though nor I am sure if it needs to be fixed in xharness or in the XHarness Helix SDK.

@ivanpovazan do you know when do we reset simulators?

premun avatar Apr 24 '25 17:04 premun

I think we have two problems (which might have the same root cause):

  • Problem 1: The one I reported seems to be related to the simulator reset sequence where the issue happens when we erase device content and settings and then try to boot it afterwards (in the example below device ID is: 1F8B7AF1-9004-472A-AFA4-A5DB2E11FEB4):

    [05:29:29] dbug: Running /Applications/Xcode_14.3.app/Contents/Developer/usr/bin/simctl erase 1F8B7AF1-9004-472A-AFA4-A5DB2E11FEB4
    [05:29:29] dbug: Process simctl exited with 0
    [05:29:29] dbug: 
    [05:29:29] dbug: Running /Applications/Xcode_14.3.app/Contents/Developer/usr/bin/simctl boot 1F8B7AF1-9004-472A-AFA4-A5DB2E11FEB4
    [05:29:29] dbug: An error was encountered processing the command (domain=NSCocoaErrorDomain, code=513):
    [05:29:29] dbug: You don’t have permission to save the file “1F8B7AF1-9004-472A-AFA4-A5DB2E11FEB4” in the folder “CoreSimulator”.
    [05:29:29] dbug: You don’t have permission.
    

    The boot will try to write into sim device folder located at: ~/Library/Developer/CoreSimulator/Devices/1F8B7AF1-9004-472A-AFA4-A5DB2E11FEB4 which throws the error.

    NOTE: To answer when we run reset-simulator command, we run it on every xharness PR to test supported simulator commands. Additionally, we also reset simulators occasionally when number of job retries becomes too high.

  • Problem 2: what @dellis1972 is seeing seems to be related to Logs folder permissions:

    00:53.115: 2025-04-24 12:35:14.034 simctl[20934:341193] Error opening log file (/Users/xxxx/Library/Logs/CoreSimulator/CoreSimulator.com.apple.CoreSimulator.simctl.log): Permission denied
            00:53.115: An error was encountered processing the command (domain=NSCocoaErrorDomain, code=513):
            00:53.115: You don’t have permission to save the file “012C3ED0-0BE0-4C66-BDD8-09A89D69E53A” in the folder “CoreSimulator”.
            00:53.115: You don’t have permission.
            00:53.115: To view or change permissions, select the item in the Finder and choose File > Get Info.
            00:53.115: Underlying error (domain=NSPOSIXErrorDomain, code=13):
    

    As ~/Library/Logs/CoreSimulator/012C3ED0-0BE0-4C66-BDD8-09A89D69E53A should contain logs for device ID 012C3ED0-0BE0-4C66-BDD8-09A89D69E53A.


To come back to what was previously said:

  • Do we just need sudo at the right spot ie: for xharness in Problem 1, and for simctl for Problem 2 (not sure which tool/script is executing it) ?
  • Can we connect to the helix machine reporting this problem and manually reproduce the CoreSimulator permission issues, so that we can figure out how we can workaround this?

ivanpovazan avatar Apr 25 '25 09:04 ivanpovazan

So I'm not using xharness, this is custom code. I can certainly add the sudo fix if I detect the permission issue.

dellis1972 avatar Apr 25 '25 09:04 dellis1972

This issue has been migrated to Azure DevOps: https://dev.azure.com/dnceng/internal/_workitems/edit/8510

garath avatar Oct 11 '25 00:10 garath