mssql-docker icon indicating copy to clipboard operation
mssql-docker copied to clipboard

Linux Docker - RETAIL ASSERT: Expression=(!"A timeout or deadlock was encountered while waiting" " for a thread to terminate/suspend/resume.") File=NtumWaiter.cpp Line=610

Open ghost opened this issue 4 years ago • 7 comments

Background: Running linux SQL 2019-latest in a docker on Azure in container instances for short lived work

Issue: During SQL operations, such as a backup restore in this latest instance, we are hitting a retail assertion error. Details:

This program has encountered a fatal error and cannot continue running at Tue May 12 17:16:29 2020
The following diagnostic information is available:

         Reason: 0x00000004
        Message: RETAIL ASSERT: Expression=(!"A timeout or deadlock was encountered while waiting" " for a thread to terminate/suspend/resume.") File=NtumWaiter.cpp Line=610
    Stack Trace:
                 file://package4/windows/system32/sqlpal.dll+0x000000000033D3A7
                 file://package4/windows/system32/sqlpal.dll+0x000000000033CAEF
                 file://package4/windows/system32/sqlpal.dll+0x0000000000206DAA
                 file://package4/windows/system32/sqlpal.dll+0x000000000026CD97
                 file://package4/windows/system32/sqlpal.dll+0x000000000026C7E7
                 file://package4/windows/system32/sqlpal.dll+0x0000000000236C9C
                 file://package4/windows/system32/sqlpal.dll+0x0000000000236B97
                 file://package4/windows/system32/sqlpal.dll+0x0000000000251A0C
                 file://package4/windows/system32/sqlpal.dll+0x000000000024F96C
                 file://package4/windows/system32/sqlpal.dll+0x0000000000203243
                 file://package4/windows/system32/sqlpal.dll+0x000000000037CC08
                 file:///Windows/SYSTEM32/KERNELBASE.dll+0x000000000006F94F
                 file:///binn/sqlmin.dll+0x00000000001AD508
                 file:///binn/sqlmin.dll+0x00000000001AD7E7
                 file:///binn/sqlservr.exe+0x0000000000003C0B
                 file:///binn/sqldk.dll+0x00000000000A3A40
                 file:///binn/sqldk.dll+0x000000000007BE9A
                 file:///binn/sqldk.dll+0x0000000000018E52
                 file:///binn/sqldk.dll+0x00000000000144B2
                 file:///binn/sqldk.dll+0x0000000000013D82
                 file:///binn/sqldk.dll+0x00000000001121C9
                 file:///binn/sqldk.dll+0x0000000000009B33
                 file:///binn/sqldk.dll+0x000000000000A48D
                 file:///binn/sqldk.dll+0x000000000000A295
                 file:///binn/sqldk.dll+0x0000000000027020
                 file:///binn/sqldk.dll+0x0000000000027B2B
                 file:///binn/sqldk.dll+0x0000000000027931
                 file:///Windows/SYSTEM32/KERNEL32.DLL+0x0000000000013424
                 file:///windows/system32/ntdll.dll+0x0000000000073411
                 <unknown>+0x0000000073030C00
        Process: 61 - sqlservr
         Thread: 186 (application thread 0x18c)
    Instance Id: 533c5e1e-8baf-45a0-9c8f-01c6503a48c0
       Crash Id: e64632db-4b68-4243-b0f0-d29accb848e6
    Build stamp: 9d61bcf28d2533f40f3df073a5c55d3c36750b6b1e650db137f069439b440661
   Distribution: Ubuntu 18.04.4 LTS
     Processors: 4
   Total Memory: 16797073408 bytes
      Timestamp: Tue May 12 17:16:29 2020

Statement which caused the assert in this case:

RESTORE DATABASE [2a316a2b-cc2a-4df6-894a-751ee9f5b486]
                          FROM DISK = '/mnt/scratch/2a316a2b-cc2a-4df6-894a-751ee9f5b486.ifi'
                          WITH REPLACE, MOVE 'data' TO '/mnt/scratch/2a316a2b-cc2a-4df6-894a-751ee9f5b486.mdf',
                          MOVE 'log' TO '/mnt/scratch/2a316a2b-cc2a-4df6-894a-751ee9f5b486_log.ldf';

Database in question is around 10GB

I realize this may not be the most correct place for support on this issue, please feel free to redirect me if necessary.

ghost avatar May 12 '20 18:05 ghost

I run sql server inside docker on MacBook Pro. I seem to hit this issue occasionally when my laptop comes back from hibernate (sql server is running on the background):

sqlserver_1  |          Reason: 0x00000004
sqlserver_1  |         Message: RETAIL ASSERT: Expression=(!"A timeout or deadlock was encountered while waiting" " for a thread to terminate/suspend/resume.") File=NtumWaiter.cpp Line=664
sqlserver_1  |     Stack Trace:
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x0000000000342857
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x0000000000341C1F
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x0000000000206DDA
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x000000000026D363
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x000000000026CD6B
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x0000000000236E27
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x0000000000251B6C
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x000000000024DC17
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x000000000024DE12
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x0000000000266AC2
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x0000000000266A32
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x0000000000265ED4
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x0000000000265F7B
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x0000000000203292
sqlserver_1  |                  file://package4/windows/system32/sqlpal.dll+0x00000000003818E8
sqlserver_1  |                  file:///windows/system32/ntdll.dll+0x0000000000026F20
sqlserver_1  |                  file:///Windows/SYSTEM32/KERNEL32.DLL+0x0000000000014414
sqlserver_1  |                  file:///windows/system32/ntdll.dll+0x0000000000075541
sqlserver_1  |                  <unknown>+0x00000000DF195800
sqlserver_1  |         Process: 24 - sqlservr
sqlserver_1  |          Thread: 3224 (application thread 0x310c)
sqlserver_1  |     Instance Id: 3fb2db20-d0a5-472f-8ff3-8cda23a3a199
sqlserver_1  |        Crash Id: 5fbe32ca-68c7-4517-b380-a046b1ca25a9
sqlserver_1  |     Build stamp: 3205db08d42166afc4b8c820302cbd021253dcab64faaf5f86a0cc6028e1e7be
sqlserver_1  |    Distribution: Ubuntu 18.04.4 LTS
sqlserver_1  |      Processors: 4
sqlserver_1  |    Total Memory: 2087837696 bytes
sqlserver_1  |       Timestamp: Sun Aug 23 02:16:27 2020

mkoppanen avatar Aug 24 '20 06:08 mkoppanen

This might be the virtualization issue fixed in 10.15.6. Perhaps patch your base OS.

Thanks, Anthony

Anthony E. Nocentino Senior Technical Fellow Centino Systems www.centinosystems.comhttp://www.centinosystems.com/ | Bloghttp://www.centinosystems.com/blog/?utm_source=Email%20Signature&utm_medium=Email&utm_content=Blog%20Page&utm_campaign=Email%20Signature | LinkedInhttps://www.linkedin.com/in/nocentino | Pluralsighthttp://bit.ly/2avjZFZ | Microsoft MVPhttp://bit.ly/2iA78SB

[cid:[email protected]]

On Aug 24, 2020, at 1:56 AM, Mikko Koppanen <[email protected]mailto:[email protected]> wrote:

I run sql server inside docker on MacBook Pro. I seem to hit this issue occasionally when my laptop comes back from hibernate (sql server is running on the background):

sqlserver_1 | Reason: 0x00000004 sqlserver_1 | Message: RETAIL ASSERT: Expression=(!"A timeout or deadlock was encountered while waiting" " for a thread to terminate/suspend/resume.") File=NtumWaiter.cpp Line=664 sqlserver_1 | Stack Trace: sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x0000000000342857 sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x0000000000341C1F sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x0000000000206DDA sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x000000000026D363 sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x000000000026CD6B sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x0000000000236E27 sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x0000000000251B6C sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x000000000024DC17 sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x000000000024DE12 sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x0000000000266AC2 sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x0000000000266A32 sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x0000000000265ED4 sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x0000000000265F7B sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x0000000000203292 sqlserver_1 | file://package4/windows/system32/sqlpal.dll+0x00000000003818E8 sqlserver_1 | file:///windows/system32/ntdll.dll+0x0000000000026F20 sqlserver_1 | file:///Windows/SYSTEM32/KERNEL32.DLL+0x0000000000014414 sqlserver_1 | file:///windows/system32/ntdll.dll+0x0000000000075541 sqlserver_1 | +0x00000000DF195800 sqlserver_1 | Process: 24 - sqlservr sqlserver_1 | Thread: 3224 (application thread 0x310c) sqlserver_1 | Instance Id: 3fb2db20-d0a5-472f-8ff3-8cda23a3a199 sqlserver_1 | Crash Id: 5fbe32ca-68c7-4517-b380-a046b1ca25a9 sqlserver_1 | Build stamp: 3205db08d42166afc4b8c820302cbd021253dcab64faaf5f86a0cc6028e1e7be sqlserver_1 | Distribution: Ubuntu 18.04.4 LTS sqlserver_1 | Processors: 4 sqlserver_1 | Total Memory: 2087837696 bytes sqlserver_1 | Timestamp: Sun Aug 23 02:16:27 2020

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/microsoft/mssql-docker/issues/601#issuecomment-678943269, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE4X5FJHA3FRAXEICMY55U3SCIFKZANCNFSM4M7B5Q6A.

nocentino avatar Aug 24 '20 13:08 nocentino

Also getting this occasionally on Arch Linux.

Edit: I am also getting it upon RESTORE DATABASE.

badeball avatar Nov 02 '20 15:11 badeball

is it a bug? i also get it ! SQL Server hangs and becomes unresponsive when do RESTORE DATABASE. check the crash.txt , get the same diagnostic information Reason: 0x00000004 Message: RETAIL ASSERT: Expression=(!"A timeout or deadlock was encountered while waiting" " for a thread to terminate/suspend/resume.") File=NtumWaiter.cpp Line=610 Stack Trace: file://package4/windows/system32/sqlpal.dll+0x000000000033D3A7 file://package4/windows/system32/sqlpal.dll+0x000000000033CAEF file://package4/windows/system32/sqlpal.dll+0x0000000000206DAA file://package4/windows/system32/sqlpal.dll+0x000000000026CD97 file://package4/windows/system32/sqlpal.dll+0x000000000026C7E7 .... OS : CentOS Kernel : 4.19 docker image tag : 2017-CU21-ubuntu-16.04

z-k-q avatar Jun 03 '21 06:06 z-k-q

I get the same problem using sqlpackage to import a bacpac exported from Azure. It's repeatable with the same database every time. Seems to error in the same place when enabling indexes after importing the actual table data.

This program has encountered a fatal error and cannot continue running at Wed Jun 30 15:10:39 2021

The following diagnostic information is available:


Reason: 0x00000004

Message: RETAIL ASSERT: Expression=(!"A timeout or deadlock was encountered while waiting" " for a thread to terminate/suspend/resume.") File=NtumWaiter.cpp Line=702

Stack Trace:

file://package5/windows/system32/sqlpal.dll+0x0000000000296B1A
file://package5/windows/system32/sqlpal.dll+0x0000000000295E57
file://package5/windows/system32/sqlpal.dll+0x0000000000299FAA
file://package5/windows/system32/sqlpal.dll+0x000000000026A0B5
file://package5/windows/system32/sqlpal.dll+0x0000000000269A17
file://package5/windows/system32/sqlpal.dll+0x000000000024D899
file://package5/windows/system32/sqlpal.dll+0x000000000024B4B9
file://package5/windows/system32/sqlpal.dll+0x00000000002AAAAB
file://package5/windows/system32/sqlpal.dll+0x0000000000388998
file:///Windows/SYSTEM32/KERNELBASE.dll+0x0000000000070A5F
file:///binn/sqlmin.dll+0x00000000001E3B88
file:///binn/sqlmin.dll+0x00000000001E3E67
file:///binn/sqlservr.exe+0x0000000000003C9B
file:///binn/sqldk.dll+0x00000000000A6560
file:///binn/sqldk.dll+0x000000000007D7BA
file:///binn/sqldk.dll+0x000000000001A2F1
file:///binn/sqldk.dll+0x0000000000019622
file:///binn/sqldk.dll+0x0000000000018D72
file:///binn/sqldk.dll+0x0000000000114B69
file:///binn/sqldk.dll+0x0000000000009E23
file:///binn/sqldk.dll+0x000000000000A39D
file:///binn/sqldk.dll+0x000000000000A19E
file:///binn/sqldk.dll+0x0000000000038242
file:///binn/sqldk.dll+0x0000000000037E4C
file:///binn/sqldk.dll+0x0000000000038993
file:///Windows/SYSTEM32/KERNEL32.DLL+0x0000000000014414
file:///windows/system32/ntdll.dll+0x0000000000075541
Process: 18 - sqlservr
Thread: 118 (application thread 0x1a8)
Instance Id: 8eb00283-6b6e-4459-8ac5-db6922e837d3
Crash Id: 90828ec0-c095-4f07-8eb5-ea4136ecfd2b
Build stamp: aaa50c081b9257e4e1b207453609194cf6575c0380be3aed816b97d6e1111435
Distribution: Ubuntu 16.04.7 LTS
Processors: 12
Total Memory: 4025675776 bytes
Timestamp: Wed Jun 30 15:10:39 2021

Other database bacpac files import OK, but I have no idea what could be causing the issue in this particular database.

I'm having the exact same problem on mssql for linux latest versions of 2017 and 2019. This particular stack trace is from:

Server Microsoft SQL Server 2019 (RTM-CU11) (KB5003249) - 15.0.4138.2 (X64)

I'm using the version of sqlpackage from this "evergreen" link: https://aka.ms/sqlpackage-linux:

sqlpackage-linux-x64-en-US-15.0.5084.2.zip

My sql data folder is mapped back to my host OS (Windows) via a Docker volume.

The command that errors is as follows:

/opt/sqlpackage/sqlpackage /a:Import /tsn:. /tdn:"$fullDbName" /tu:sa /tp:"$SA_PASSWORD" /sf:"$srcFileName" /mp:1 /p:CommandTimeout=120

I've tried changing the /mp: param to several values (1,2,4,8). I've tried changing the /p:CommandTimeout param to over an hour. Nothing works. The whole thing just stops during the import. CPU goes to 0, IO goes to 0. It's not doing anything. Then it times out and throws the above error.

UPDATE:

Interestingly I can successfully import the bacpac from my Windows host OS via SSMS without error, which implies this could be a bug in sqlpackage for linux perhaps?

UPDATE 2:

I can confirm the problem lies with SqlPackage multi-platform .net core edition. I've repeated the same command that errors on both Linux and Windows versions using the exact same bacpac file each time.

The most recent version of SqlPackage that works is v18.6 build 15.0.4897.1 which can be found here:

https://docs.microsoft.com/en-us/sql/tools/sqlpackage/release-notes-sqlpackage?view=sql-server-ver15#186-sqlpackage

v18.7 build 15.0.5084.2 and 18.7.1 build 15.0.5164.1 (latest as of writing) both exhibit the timeout problem.

UPDATE 3

Turns out I was hasty. I'm still getting the problem with older version of sqlpackage, however now on different (larger) databases. I've now come to the conclusion it's resource-pressure related. Mainly RAM I think. I created a separate container to perform the sqlpackage command so it wasn't running within the same container as sql server itself, and everything seems to work now.

Very strange, but it has resolved the issue for me.

theyetiman avatar Jun 30 '21 15:06 theyetiman

I too have this happening. I have created a copy of this file: https://github.com/microsoft/mssql-docker/blob/master/linux/preview/examples/mssql-customize/configure-db.sh and modified it to my needs to install the databases required for my solution onto the server after it's started and ready.

After the databases are installed nearly EVERY startup, I get the following on line 702 not 610, like your post

2024-02-01 04:22:46.02 spid12s     Starting up database 'tempdb'.
db-1     | This program has encountered a fatal error and cannot continue running at Thu Feb  1 04:31:30 2024
db-1     | The following diagnostic information is available:
db-1     |
db-1     |          Reason: 0x00000004
db-1     |         Message: RETAIL ASSERT: Expression=(!"A timeout or deadlock was encountered while waiting" " for a thread to terminate/suspend/resume.") File=NtumWaiter.cpp Line=702
db-1     |     Stack Trace:
db-1     |                  file://package4/windows/system32/sqlpal.dll+0x000000000000E7AF
db-1     |                  file://package4/windows/system32/sqlpal.dll+0x000000000000D3B3
db-1     |                  file://package4/windows/system32/sqlpal.dll+0x000000000001453B
db-1     |                  file://package4/windows/system32/sqlpal.dll+0x00000000000A4ADE
db-1     |                  file://package4/windows/system32/sqlpal.dll+0x000000000006CB39
db-1     |                  file://package4/windows/system32/sqlpal.dll+0x000000000005E4E1
db-1     |                  file://package4/windows/system32/sqlpal.dll+0x000000000005E07C
db-1     |                  file://package4/windows/system32/sqlpal.dll+0x000000000005B023
db-1     |                  file://package4/windows/system32/sqlpal.dll+0x0000000000003D1F
db-1     |                  file://package4/windows/system32/sqlpal.dll+0x0000000000205488
db-1     |                  file:///Windows/SYSTEM32/KERNELBASE.dll+0x0000000000070A5F
db-1     |                  file:///binn/sqlmin.dll+0x00000000001F7308
db-1     |                  file:///binn/sqlmin.dll+0x00000000001F75E7
db-1     |                  file:///binn/sqlservr.exe+0x0000000000003DE9
db-1     |                  file:///binn/sqldk.dll+0x00000000000A7DE0
db-1     |                  file:///binn/sqldk.dll+0x000000000007E65B
db-1     |                  file:///binn/sqldk.dll+0x00000000000140FF
db-1     |                  file:///binn/sqldk.dll+0x00000000000136C2
db-1     |                  file:///binn/sqldk.dll+0x0000000000012B12
db-1     |                  file:///binn/sqldk.dll+0x0000000000115E79
db-1     |                  file:///binn/sqldk.dll+0x000000000000A013
db-1     |                  file:///binn/sqldk.dll+0x000000000000A5BF
db-1     |                  file:///binn/sqldk.dll+0x000000000000A38E
db-1     |                  file:///binn/sqldk.dll+0x0000000000024482
db-1     |                  file:///binn/sqldk.dll+0x0000000000023D2F
db-1     |                  file:///binn/sqldk.dll+0x00000000000242B8
db-1     |                  file:///Windows/SYSTEM32/KERNEL32.DLL+0x0000000000014414
db-1     |                  file:///windows/system32/ntdll.dll+0x0000000000075541
db-1     |         Modules:
db-1     |                  file://package4/windows/system32/sqlpal.dll=2E003A22798C212BD27C3124506D77F51
db-1     |                  file:///Windows/SYSTEM32/KERNELBASE.dll=ACB8C8887582458AADBABAF6F2400B2C2
db-1     |                  file:///binn/sqlmin.dll=8EB388AFAAA24391B6E2BC234815D2D42
db-1     |                  file:///binn/sqlservr.exe=977D3BC7D3064C249547F6C996F25EDD2
db-1     |                  file:///binn/sqldk.dll=D517ED2D7ADC4B26BA5FA681422869CC2
db-1     |                  file:///Windows/SYSTEM32/KERNEL32.DLL=C715300FB2664729A6126A3F591E6F302
db-1     |                  file:///windows/system32/ntdll.dll=45137AA3F9814512B3123991067EEE6E2
db-1     |         Process: 15 - sqlservr
db-1     |          Thread: 120 (application thread 0x1c0)
db-1     |     Instance Id: 48c3ceaf-b0c5-49d3-a555-424a05616981
db-1     |        Crash Id: db6369e6-1ffe-4127-9dde-aa562681f889
db-1     |     Build stamp: 8f0a1bbfe6ffccf089ab1a4a9806f7a8776ccbe957963ccb7e593515289382b5
db-1     |    Distribution: Ubuntu 20.04.6 LTS
db-1     |      Processors: 4
db-1     |    Total Memory: 16775962624 bytes
db-1     |       Timestamp: Thu Feb  1 04:31:30 2024
Capturing a dump of 15
db-1     | Successfully captured dump: /var/opt/mssql/log/core.sqlservr.2_1_2024_4_31_30.15
db-1     | Executing: /opt/mssql/bin/handle-crash.sh with parameters
db-1     |      handle-crash.sh
db-1     |      /opt/mssql/bin/sqlservr
db-1     |      15
db-1     |      /opt/mssql/bin
db-1     |      /var/opt/mssql/log/
db-1     |
db-1     |      48c3ceaf-b0c5-49d3-a555-424a05616981
db-1     |      db6369e6-1ffe-4127-9dde-aa562681f889
db-1     |
db-1     |      /var/opt/mssql/log/core.sqlservr.2_1_2024_4_31_30.15
db-1     |
db-1     | Ubuntu 20.04.6 LTS
db-1     | Capturing core dump and information to /var/opt/mssql/log...
db-1     | /bin/cat: /proc/15/maps: Permission denied

this is the public repo and branch that this is happening on. https://github.com/acnicholls/NSW/tree/62-automate-db-creation

you can run compose-start.[sh|ps1] to start the solution.

workaround 1

to get back to "working" mode, I have to restart docker, sometimes my whole machine then I can get the solution to start again, and eventually crash with above.

workaround 2

comment out the configure script in the db service's entrypoint.sh file, But I haven't tested this for any solid length of time and have other problems in the container with 3 images building from the same folders...I'll edit later.

acnicholls avatar Feb 01 '24 04:02 acnicholls

Same problem with Ubuntu 24.04 LTS 6.8.0-38-generic and mssql-docker 2019 latest and 2022 latest. We use snapshot restore in our backend tests. Restore works ~150 times. After that same exception occurs "A timeout or deadlock was encountered while waiting". No memory or cpu restrictions, enough space and nothing suspicious in SQL Profiler. It's not a timing problem either, because it still happens with 5 seconds delay after every test. It only happens in said linux VM where the gitlab runner is running but not in WSL - works like a charm locally with the same docker image!

euregon avatar Jul 17 '24 15:07 euregon