mssql-docker icon indicating copy to clipboard operation
mssql-docker copied to clipboard

Latest docker image crashes on startup

Open plattenschieber opened this issue 7 years ago • 6 comments

Hello folks,

neither the docker image nor the installation via apt-get results in a successful start of mssql server. I get the following error message:

~/Downloads » docker logs 79d3cca0dd07                                                                                                                                                 
This is an evaluation version.  There are [118] days left in the evaluation period.
This program has encountered a fatal error and cannot continue running.
The following diagnostic information is available:

       Reason: 0x00000003
      Message: processorId < static_cast<int>(GetProcessorCount())
   Stacktrace: 00005629ab92d993 00005629ab92dade 00005629ab92ae6a 
               00005629ab949557 00005629ab9220eb 00007f0a3e135830 
               00005629ab91fcb9 
      Process: 9 - sqlservr
       Thread: 9
  Instance Id: ef01361f-f023-4285-815a-a46fce92c9a0
     Crash Id: ecaaaeed-9189-414d-a812-60004f26cc83
  Build stamp: a37664e45e4156e76a53fa282fd694cb49f70c2037515f5684e3ce6dfa7549bc

Capturing core dump and information...
No journal files were found.
No journal files were found.
Attempting to capture a dump with paldumper
WARNING: Capture attempt failure detected
Attempting to capture a filtered dump with paldumper
WARNING: Attempt to capture dump failed.  Reference /var/opt/mssql/log/core.sqlservr.9.temp/log/paldumper-debug.log for details
Attempting to capture a dump with gdb
WARNING: Unable to capture crash dump with GDB. You may need to
allow ptrace debugging, enable the CAP_SYS_PTRACE capability, or
run as root.

docker info                                                                                                                                                              hans@deeplearn-labor
Containers: 26
 Running: 0
 Paused: 0
 Stopped: 26
Images: 44
Server Version: 17.06.2-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 143
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local nvidia-docker
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.10.0-32-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 31.32GiB
...
...
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
...
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Any suggestions what this could be?

plattenschieber avatar Sep 27 '17 12:09 plattenschieber

what does your docker run command look like?

twright-msft avatar Oct 06 '17 03:10 twright-msft

This looks a lot like #126. Probably a duplicate?

andrewnicols avatar Oct 17 '17 01:10 andrewnicols

This issue is plaguing me trying to start mssql-server inside a container inside a VM. The VM has 17 cores allocated to it, but /proc/cpuinfo has this:

sh-5.1# grep ^processor /proc/cpuinfo
processor       : 0
processor       : 1
processor       : 3
processor       : 5
processor       : 8
processor       : 9
processor       : 11
processor       : 12
processor       : 14
processor       : 16

One representative vCPU:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
stepping        : 7
microcode       : 0x5003103
cpu MHz         : 2199.998
cache size      : 16384 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities
bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs taa
bogomips        : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

RobertKrawitz avatar Feb 24 '22 20:02 RobertKrawitz

It appears that the exact contents of /proc/cpuinfo varies from run to run. Same thing happens if I use a VM with 16 cores. Once in a while I get the normal list of CPUs, which may be why once in a while mssql server doesn't crash.

RobertKrawitz avatar Feb 24 '22 20:02 RobertKrawitz

Upon closer inspection, it turns out that the runtime that we're using is offlining the processors that are not showing up (as seen by lscpu). So for example, from a different run:

# grep ^processor /proc/cpuinfo 
processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 4
processor       : 5
processor       : 6
processor       : 7
processor       : 8
processor       : 9
processor       : 10
processor       : 11
processor       : 12
processor       : 14
processor       : 15

# lscpu
...
CPU(s):                          16
On-line CPU(s) list:             0-12,14,15
Off-line CPU(s) list:            13
...

RobertKrawitz avatar Feb 28 '22 16:02 RobertKrawitz

Actually, the real cluster is running 1.1; 1.0.2 it works fine.

RobertKrawitz avatar Feb 28 '22 18:02 RobertKrawitz