azure-cosmos-db-emulator-docker icon indicating copy to clipboard operation
azure-cosmos-db-emulator-docker copied to clipboard

Fatal error when running image

Open andeliero opened this issue 1 year ago • 32 comments

To Reproduce Steps to reproduce the behavior:

  1. Run the image as follows docker run -p 8081:8081 -p 10250:10250 --name azure-cosmos-emulator mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator

Output

% docker run -p 8081:8081 -p 10250:10250 --name azure-cosmos-emulator mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator
This is an evaluation version.  There are [67] days left in the evaluation period.
Starting
This program has encountered a fatal error and cannot continue running at Wed Aug  7 10:47:25 2024
The following diagnostic information is available:

         Reason: Fatal Signal (0x00000001)
         Signal: SIGABRT - Aborted (6)
          Stack:
                 SP               IP               Function
                 ---------------- ---------------- ----------------
                 00007fb06d335360 0000564ab69d4bfa <unknown>
                 00007fb06d3363c0 0000564ab69d45cf <unknown>
                 00007fb06d336620 0000564ab69d3a61 <unknown>
                 00007fb06d336640 00007fb06ef1b090 killpg+0x40
                 00007fb06d336bd0 00007fb06ef1b00b gsignal+0xcb
                 00007fb06d336cf0 00007fb06eefa859 abort+0x12b
                 00007fb06d336e20 0000564ab696c332 <unknown>
                 00007fb06d336ed0 0000564ab69f5b14 <unknown>
                 00007fb06d336ef0 0000564ab6a24f38 <unknown>
                 00007fb06d336fa0 0000564ab6a24d1a <unknown>
                 00007fb06d337000 0000564ab69781ba <unknown>
                 00007fb06d337080 0000564ab6977e0d <unknown>
                 00007fb06d337160 0000564ab69ed361 <unknown>

        Process: 19 - cosmosdb-emulator
         Thread: 128 (application thread 0x1d4)
    Instance Id: dd7ec0de-670a-47df-bd23-ec5dadee17d9
       Crash Id: 0b1ea45a-a7ec-4505-b8df-30d08125e4b4
    Build stamp: (null)
   Distribution: Ubuntu 20.04.6 LTS
     Processors: 4
   Total Memory: 8326389760 bytes
      Timestamp: Wed Aug  7 10:47:25 2024
     Last errno: -34938881
Last errno text: Unknown error -34938881
*********** PAL PANIC CORE DUMP GENERATION FAILED **********
Unable to locate handle-crash.sh. Error: File: signals.cpp:483 [Status: 0xC0000034 Object name not found errno = 0x2(2) No such file or directory]
Executing: /usr/local/bin/cosmos/handle-crash.sh with parameters
     handle-crash.sh
     /usr/local/bin/cosmos/cosmosdb-emulator
     19
     /usr/local/bin/cosmos
     /tmp/cosmos/appdata/log/

     dd7ec0de-670a-47df-bd23-ec5dadee17d9
     0b1ea45a-a7ec-4505-b8df-30d08125e4b4

*********** PANIC CORE DUMP GENERATION FAILED **********
Attempt to launch handle-crash.sh failed.
This program has encountered a fatal error and cannot continue running at Wed Aug  7 10:47:25 2024
The following diagnostic information is available:

         Reason: Host Extension RTL_ASSERT (0x00000003)
         Status: STATUS_WAIT_2 (0x00000002)
        Message: !killTheTarget
          Stack:
                 SP               IP               Function
                 ---------------- ---------------- ----------------
                 00007fb06cdfbdf0 0000564ab69d4bfa <unknown>
                 00007fb06cdfce50 0000564ab69d45cf <unknown>
                 00007fb06cdfd0b0 0000564ab693eb66 <unknown>
                 00007fb06cdfd0e0 0000564ab69d80e4 <unknown>
                 00007fb06cdfe310 0000564ab69d77a9 <unknown>
                 00007fb06cdfe440 00007fb06f576609 start_thread+0xd9
                 00007fb06cdfe500 00007fb06eff7353 clone+0x43

        Process: 17 - cosmosdb-emulator
         Thread: 18
    Instance Id: dd7ec0de-670a-47df-bd23-ec5dadee17d9
       Crash Id: 0b1ea45a-a7ec-4505-b8df-30d08125e4b4
    Build stamp: (null)
   Distribution: Ubuntu 20.04.6 LTS
     Processors: 4
   Total Memory: 8326389760 bytes
      Timestamp: Wed Aug  7 10:47:25 2024
     Last errno: 2
Last errno text: No such file or directory
Aborted

Desktop:

  • OS: macOS 14.5 (x86)
  • Docker Desktop: 4.33.0 (160616)
  • Docker Engine: 27.1.1

andeliero avatar Aug 06 '24 09:08 andeliero

Could it potentially be a compatibility clash with the latest version of Docker (4.33.0)? Came into the same error this morning and tried several methods (pruning, restarting Docker client etc.) and so far the only fix currently working was downloading an older version of the Docker client. Couldn't remember which version I had prior to installing 4.33.0, so currently using 4.29.0 and emulator image is working as expected.

rtasalem avatar Aug 08 '24 10:08 rtasalem

Seeing the exact same error here using

  • OS: Ubuntu 22.04
  • Docker: 27.1.2, build d01f264

tomvater avatar Aug 16 '24 11:08 tomvater

I have the same issue. Ubuntu 22.04 and I tried on Docker Engine 27.1.2 and 25.0.4

patruvlad avatar Aug 20 '24 13:08 patruvlad

For me too, on Ubuntu 22.04 on Docker 27.1.2

kraussjo avatar Aug 21 '24 11:08 kraussjo

Same issue here: Ubuntu 22.04 on Docker 24.0.4

alexfandos avatar Aug 21 '24 14:08 alexfandos

https://github.com/Azure/azure-cosmos-db-emulator-docker/issues/84 This might be related. My kernel updated to 6.8.0-40-generic last Friday. Not sure what kernel I had before

I just booted up my PC with an older kernel and it worked just fine

alexfandos avatar Aug 21 '24 16:08 alexfandos

Been running this container for a while, but today is the first day I got this error and am unable to start it.

~> lsb_release -a && uname -r && docker --version
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy
6.8.0-40-generic
Docker version 27.1.2, build d01f264

Docker packages (among other things) was updated yesterday, kernel a few days ago:

Start-Date: 2024-08-21  08:53:13
Commandline: aptdaemon role='role-commit-packages' sender=':1.133'
Upgrade:
containerd.io:amd64 (1.7.19-1, 1.7.20-1),
linux-tools-common:amd64 (5.15.0-118.128, 5.15.0-119.129),
docker-ce-cli:amd64 (5:27.1.1-1~ubuntu.22.04~jammy, 5:27.1.2-1~ubuntu.22.04~jammy),
mongodb-mongosh:amd64 (2.2.15, 2.3.0),
google-chrome-stable:amd64 (127.0.6533.99-1, 127.0.6533.119-1),
apport-gtk:amd64 (2.20.11-0ubuntu82.5, 2.20.11-0ubuntu82.6),
docker-buildx-plugin:amd64 (0.16.1-1~ubuntu.22.04~jammy, 0.16.2-1~ubuntu.22.04~jammy),
docker-ce:amd64 (5:27.1.1-1~ubuntu.22.04~jammy, 5:27.1.2-1~ubuntu.22.04~jammy),
ubuntu-desktop:amd64 (1.481.2, 1.481.3),
docker-ce-rootless-extras:amd64 (5:27.1.1-1~ubuntu.22.04~jammy, 5:27.1.2-1~ubuntu.22.04~jammy),
ubuntu-standard:amd64 (1.481.2, 1.481.3),
ubuntu-desktop-minimal:amd64 (1.481.2, 1.481.3),
python3-apport:amd64 (2.20.11-0ubuntu82.5, 2.20.11-0ubuntu82.6),
code:amd64 (1.92.1-1723066302, 1.92.2-1723660989),
python3-problem-report:amd64 (2.20.11-0ubuntu82.5, 2.20.11-0ubuntu82.6),
apport:amd64 (2.20.11-0ubuntu82.5, 2.20.11-0ubuntu82.6),
ubuntu-minimal:amd64 (1.481.2, 1.481.3),
linux-libc-dev:amd64 (5.15.0-118.128, 5.15.0-119.129)
End-Date: 2024-08-21  08:53:47

Start-Date: 2024-08-20  09:09:08
Commandline: /usr/bin/unattended-upgrade
Install:
linux-tools-common:amd64 (5.15.0-118.128, automatic),
linux-hwe-6.8-headers-6.8.0-40:amd64 (6.8.0-40.40~22.04.3, automatic),
linux-hwe-6.8-tools-6.8.0-40:amd64 (6.8.0-40.40~22.04.3, automatic),
linux-modules-6.8.0-40-generic:amd64 (6.8.0-40.40~22.04.3, automatic),
hwdata:amd64 (0.357-1, automatic),
linux-image-6.8.0-40-generic:amd64 (6.8.0-40.40~22.04.3, automatic),
linux-headers-6.8.0-40-generic:amd64 (6.8.0-40.40~22.04.3, automatic),
linux-tools-6.8.0-40-generic:amd64 (6.8.0-40.40~22.04.3, automatic),
linux-modules-extra-6.8.0-40-generic:amd64 (6.8.0-40.40~22.04.3, automatic)
Upgrade: 
linux-image-generic-hwe-22.04:amd64 (6.5.0.45.45~22.04.1, 6.8.0-40.40~22.04.3),
linux-headers-generic-hwe-22.04:amd64 (6.5.0.45.45~22.04.1, 6.8.0-40.40~22.04.3),
linux-generic-hwe-22.04:amd64 (6.5.0.45.45~22.04.1, 6.8.0-40.40~22.04.3)
End-Date: 2024-08-20  09:09:36

nettum avatar Aug 22 '24 11:08 nettum

Tried to run on MacOS with docker desktop, ran into this same issue. Decided to run on my local server with podman and same issue.

# podman run --publish 9091:8081 --publish 10250-10255:10250-10255 --name test-container mcr.microsoft.com/cosmosdb/linux/azu
re-cosmos-emulator:latest 
This is an evaluation version.  There are [52] days left in the evaluation period.
Starting
This program has encountered a fatal error and cannot continue running at Thu Aug 22 21:54:01 2024
The following diagnostic information is available:

         Reason: Fatal Signal (0x00000001)
         Signal: SIGABRT - Aborted (6)
          Stack:
                 SP               IP               Function
                 ---------------- ---------------- ----------------
                 00007f9db2ad2360 0000557c9dbfcbfa <unknown>
                 00007f9db2ad33c0 0000557c9dbfc5cf <unknown>
                 00007f9db2ad3620 0000557c9dbfba61 <unknown>
                 00007f9db2ad3640 00007f9db6462090 killpg+0x40
                 00007f9db2ad3bd0 00007f9db646200b gsignal+0xcb
                 00007f9db2ad3cf0 00007f9db6441859 abort+0x12b
                 00007f9db2ad3e20 0000557c9db94332 <unknown>
                 00007f9db2ad3ed0 0000557c9dc1db14 <unknown>
                 00007f9db2ad3ef0 0000557c9dc4cf38 <unknown>
                 00007f9db2ad3fa0 0000557c9dc4cd1a <unknown>
                 00007f9db2ad4000 0000557c9dba01ba <unknown>
                 00007f9db2ad4080 0000557c9db9fe0d <unknown>
                 00007f9db2ad4160 0000557c9dc15361 <unknown>

        Process: 19 - cosmosdb-emulator
         Thread: 127 (application thread 0x1d0)
    Instance Id: 528d8477-ad57-45b8-8a2e-0b800cafaf85
       Crash Id: 50516bb9-3fb2-44cd-b985-d4b303bc1fbc
    Build stamp: (null)
   Distribution: Ubuntu 20.04.6 LTS
     Processors: 16
   Total Memory: 16697229312 bytes
      Timestamp: Thu Aug 22 21:54:01 2024
     Last errno: -34938881
Last errno text: Unknown error -34938881
*********** PAL PANIC CORE DUMP GENERATION FAILED **********
Unable to locate handle-crash.sh. Error: File: signals.cpp:483 [Status: 0xC0000034 Object name not found errno = 0x2(2) No such file or directory]
Executing: /usr/local/bin/cosmos/handle-crash.sh with parameters
     handle-crash.sh
     /usr/local/bin/cosmos/cosmosdb-emulator
     19
     /usr/local/bin/cosmos
     /tmp/cosmos/appdata/log/
     
     528d8477-ad57-45b8-8a2e-0b800cafaf85
     50516bb9-3fb2-44cd-b985-d4b303bc1fbc
     
*********** PANIC CORE DUMP GENERATION FAILED **********
Attempt to launch handle-crash.sh failed.
This program has encountered a fatal error and cannot continue running at Thu Aug 22 21:54:01 2024
The following diagnostic information is available:

         Reason: Host Extension RTL_ASSERT (0x00000003)
         Status: STATUS_WAIT_2 (0x00000002)
        Message: !killTheTarget
          Stack:
                 SP               IP               Function
                 ---------------- ---------------- ----------------
                 00007f9db41fbdf0 0000557c9dbfcbfa <unknown>
                 00007f9db41fce50 0000557c9dbfc5cf <unknown>
                 00007f9db41fd0b0 0000557c9db66b66 <unknown>
                 00007f9db41fd0e0 0000557c9dc000e4 <unknown>
                 00007f9db41fe310 0000557c9dbff7a9 <unknown>
                 00007f9db41fe440 00007f9db6abd609 start_thread+0xd9
                 00007f9db41fe500 00007f9db653e353 clone+0x43

        Process: 17 - cosmosdb-emulator
         Thread: 18
    Instance Id: 528d8477-ad57-45b8-8a2e-0b800cafaf85
       Crash Id: 50516bb9-3fb2-44cd-b985-d4b303bc1fbc
    Build stamp: (null)
   Distribution: Ubuntu 20.04.6 LTS
     Processors: 16
   Total Memory: 16697229312 bytes
      Timestamp: Thu Aug 22 21:54:01 2024
     Last errno: 2
Last errno text: No such file or directory
Aborted (core dumped)
# cat /etc/redhat-release && uname -r && podman --version
Rocky Linux release 8.10 (Green Obsidian)
6.10.6-1.el8.elrepo.x86_64
podman version 4.9.4-rhel

rowens-an avatar Aug 22 '24 21:08 rowens-an

To copy my own comment from the other issue/thread, because it might help someone:

Booting into an older kernel works for me (older docker version did not).

On my KDE Neon machine (Ubuntu based) it fails for the "6.8.0-40-generic" kernel but works on the "6.5.0-45-generic" kernel.

untitled-confused-goose avatar Aug 25 '24 14:08 untitled-confused-goose

I got the same issue with ubuntu 22.04 . I Downgraded my kernel to 5.15.0-119-generic and it worked fine for me

ref : https://askubuntu.com/questions/1404722/downgrade-kernel-for-ubuntu-22-04-lts

RushikeshMarkad16 avatar Aug 28 '24 10:08 RushikeshMarkad16

We are getting the same issue. Downgrading the kernel to 6.5 worked for us. This is basically blocking us from upgrading any of our machines now

adrian-gheorghe avatar Sep 02 '24 15:09 adrian-gheorghe

We are experiencing the same issue on MacOs 14.6.1 with docker desktop 4.34.0.

docker run --rm -it mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
This is an evaluation version.  There are [38] days left in the evaluation period.
Starting
This program has encountered a fatal error and cannot continue running at Thu Sep  5 08:45:31 2024
The following diagnostic information is available:

         Reason: Fatal Signal (0x00000001)
         Signal: SIGABRT - Aborted (6)
          Stack:
                 SP               IP               Function
                 ---------------- ---------------- ----------------
                 00007fffeee726d0 00005555556a1bfa <unknown>
                 00007fffeee73730 00005555556a15cf <unknown>
                 00007fffeee73990 00005555556a0a61 <unknown>
                 00007fffeee739b0 00007fffff154090 killpg+0x40
                 00007fffeee74e60 00007fffff15400b gsignal+0xcb
                 00007fffeee74f80 00007fffff133859 abort+0x12b
                 00007fffeee750b0 0000555555639332 <unknown>
                 00007fffeee75160 00005555556bb667 <unknown>

        Process: 19 - cosmosdb-emulator
         Thread: 127 (application thread 0x1cc)
    Instance Id: aa5b4ac9-1854-40b7-82be-62fa4dbafefc
       Crash Id: 5e90e7af-4ea3-43ad-8671-88acfbd0eba5
    Build stamp: (null)
   Distribution: Ubuntu 20.04.6 LTS
     Processors: 12
   Total Memory: 8219254784 bytes
      Timestamp: Thu Sep  5 08:45:31 2024
     Last errno: -34938881
Last errno text: Unknown error -34938881
*********** PAL PANIC CORE DUMP GENERATION FAILED **********
Unable to locate handle-crash.sh. Error: File: signals.cpp:483 [Status: 0xC0000034 Object name not found errno = 0x2(2) No such file or directory]
Executing: /usr/local/bin/cosmos/handle-crash.sh with parameters
     handle-crash.sh
     /usr/local/bin/cosmos/cosmosdb-emulator
     19
     /usr/local/bin/cosmos
     /tmp/cosmos/appdata/log/

     aa5b4ac9-1854-40b7-82be-62fa4dbafefc
     5e90e7af-4ea3-43ad-8671-88acfbd0eba5

*********** PANIC CORE DUMP GENERATION FAILED **********
Attempt to launch handle-crash.sh failed.
This program has encountered a fatal error and cannot continue running at Thu Sep  5 08:45:31 2024
The following diagnostic information is available:

         Reason: Host Extension RTL_ASSERT (0x00000003)
         Status: STATUS_WAIT_2 (0x00000002)
        Message: !killTheTarget
          Stack:
                 SP               IP               Function
                 ---------------- ---------------- ----------------
                 00007ffffd655df0 00005555556a1bfa <unknown>
                 00007ffffd656e50 00005555556a15cf <unknown>
                 00007ffffd6570b0 000055555560bb66 <unknown>
                 00007ffffd6570e0 00005555556a50e4 <unknown>
                 00007ffffd658310 00005555556a47a9 <unknown>
                 00007ffffd658440 00007fffff7af609 start_thread+0xd9
                 00007ffffd658500 00007fffff230353 clone+0x43

        Process: 17 - cosmosdb-emulator
         Thread: 18
    Instance Id: aa5b4ac9-1854-40b7-82be-62fa4dbafefc
       Crash Id: 5e90e7af-4ea3-43ad-8671-88acfbd0eba5
    Build stamp: (null)
   Distribution: Ubuntu 20.04.6 LTS
     Processors: 12
   Total Memory: 8219254784 bytes
      Timestamp: Thu Sep  5 08:45:31 2024
     Last errno: 2
Last errno text: No such file or directory
Aborted

Oleexo avatar Sep 05 '24 08:09 Oleexo

So how we can resolve this? Without downgrading anything. It should run atleast on macOS with Intel Processors and with latest Docker. Az Cosmos-DB team, please release a patch for it.

Thanks

shahnawazk avatar Sep 05 '24 21:09 shahnawazk

So how we can resolve this? Without downgrading anything. It should run atleast on macOS with Intel Processors and with latest Docker. Az Cosmos-DB team, please release a patch for it.

Thanks

Well since this doesn't involve AI I don't think M$ is going to fix it.

rowens-an avatar Sep 06 '24 14:09 rowens-an

So how we can resolve this? Without downgrading anything. It should run atleast on macOS with Intel Processors and with latest Docker. Az Cosmos-DB team, please release a patch for it. Thanks

Well since this doesn't involve AI I don't think M$ is going to fix it.

It seems to work earlier but not now with latest bits of either macOS or Docker, you can see the video: https://www.youtube.com/watch?v=NUfK6n8UXi8&t=470s&pp=ygUYY29zbW9zIGRiIGVtdWxhdG9yIG1hY29z

shahnawazk avatar Sep 07 '24 03:09 shahnawazk

Hi everyone, we're investigating this issue and will update you ASAP.

sajeetharan avatar Sep 11 '24 06:09 sajeetharan

Ubuntu 24.04.01, Docker Version 27.2.1

docker run --rm -it mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest
This is an evaluation version.  There are [31] days left in the evaluation period.
Starting
This program has encountered a fatal error and cannot continue running at Thu Sep 12 21:48:22 2024
The following diagnostic information is available:

         Reason: Fatal Signal (0x00000001)
         Signal: SIGABRT - Aborted (6)
          Stack:
                 SP               IP               Function
                 ---------------- ---------------- ----------------
                 00007d6a8400c5e0 00005a3e5cff3bfa <unknown>
                 00007d6a8400d640 00005a3e5cff35cf <unknown>
                 00007d6a8400d8a0 00005a3e5cff2a61 <unknown>
                 00007d6a8400d8c0 00007d6a903d0090 killpg+0x40
                 00007d6a8400de60 00007d6a903d000b gsignal+0xcb
                 00007d6a8400df80 00007d6a903af859 abort+0x12b
                 00007d6a8400e0b0 00005a3e5cf8b332 <unknown>
                 00007d6a8400e160 00005a3e5d00d667 <unknown>

        Process: 19 - cosmosdb-emulator
         Thread: 124 (application thread 0x1c0)
    Instance Id: a6022574-02d0-4efb-ace2-b02d55d74b21
       Crash Id: f63e7603-7b09-4bfc-9541-8437c173ba07
    Build stamp: (null)
   Distribution: Ubuntu 20.04.6 LTS
     Processors: 8
   Total Memory: 31429009408 bytes
      Timestamp: Thu Sep 12 21:48:22 2024
     Last errno: -34938881
Last errno text: Unknown error -34938881
*********** PAL PANIC CORE DUMP GENERATION FAILED **********
Unable to locate handle-crash.sh. Error: File: signals.cpp:483 [Status: 0xC0000034 Object name not found errno = 0x2(2) No such file or directory]
Executing: /usr/local/bin/cosmos/handle-crash.sh with parameters
     handle-crash.sh
     /usr/local/bin/cosmos/cosmosdb-emulator
     19
     /usr/local/bin/cosmos
     /tmp/cosmos/appdata/log/

     a6022574-02d0-4efb-ace2-b02d55d74b21
     f63e7603-7b09-4bfc-9541-8437c173ba07

*********** PANIC CORE DUMP GENERATION FAILED **********
Attempt to launch handle-crash.sh failed.
This program has encountered a fatal error and cannot continue running at Thu Sep 12 21:48:22 2024
The following diagnostic information is available:

         Reason: Host Extension RTL_ASSERT (0x00000003)
         Status: STATUS_WAIT_2 (0x00000002)
        Message: !killTheTarget
          Stack:
                 SP               IP               Function
                 ---------------- ---------------- ----------------
                 00007d6a8e3fbdf0 00005a3e5cff3bfa <unknown>
                 00007d6a8e3fce50 00005a3e5cff35cf <unknown>
                 00007d6a8e3fd0b0 00005a3e5cf5db66 <unknown>
                 00007d6a8e3fd0e0 00005a3e5cff70e4 <unknown>
                 00007d6a8e3fe310 00005a3e5cff67a9 <unknown>
                 00007d6a8e3fe440 00007d6a90a2b609 start_thread+0xd9
                 00007d6a8e3fe500 00007d6a904ac353 clone+0x43

        Process: 17 - cosmosdb-emulator
         Thread: 18
    Instance Id: a6022574-02d0-4efb-ace2-b02d55d74b21
       Crash Id: f63e7603-7b09-4bfc-9541-8437c173ba07
    Build stamp: (null)
   Distribution: Ubuntu 20.04.6 LTS
     Processors: 8
   Total Memory: 31429009408 bytes
      Timestamp: Thu Sep 12 21:48:22 2024
     Last errno: 2
Last errno text: No such file or directory
Aborted (core dumped)

(This would clearly fall into the "newer kernel" bucket...)

Clockwork-Muse avatar Sep 12 '24 21:09 Clockwork-Muse

We have released a new version of emulator with the fix. Pull the latest image from - mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest

niteshvijay1995 avatar Sep 19 '24 13:09 niteshvijay1995

We have released a new version of emulator with the fix. Pull the latest image from - mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest

Thanks Nitesh & entire CosmosDB team, it is working fine as expected.

shahnawazk avatar Sep 19 '24 19:09 shahnawazk

Still dies for us:

linux-emulator_logs.txt

adamzest avatar Sep 19 '24 19:09 adamzest

We have released a new version of emulator with the fix. Pull the latest image from - mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest

Thanks. Solved it for me.

cp-Michal-Klich avatar Sep 19 '24 19:09 cp-Michal-Klich

Still dies for us:

linux-emulator_logs.txt

Please share the dump file that is generated at the end. You can find it at path /tmp/cosmos/appdata/log/

niteshvijay1995 avatar Sep 20 '24 02:09 niteshvijay1995

Still dies for us: linux-emulator_logs.txt

Please share the dump file that is generated at the end. You can find it at path /tmp/cosmos/appdata/log/

Its too big to upload so I've put it here: https://1drv.ms/u/s!AjeTrQ40Ae3vh6M5-IYXZ8lXND20Lg?e=yruBfd

adamzest avatar Sep 20 '24 06:09 adamzest

Still dies for us too (Ubuntu 22.04.4 LTS kernel 6.8.0-40-generic Docker version 27.2.0, build 3ab4256) :

docker run --rm -it mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest

The only workaround we found is to use the older kernel versions. It works with 6.5.0-45-generic previous kernel.

We are also suffering on our CI because the kernel has been updated too. Also I want to remark that there is no way to access previous versions of the azure-cosmos-emulator (only latest), forcing us to pin older versions of the kernel instead of pining previous version of the azure-cosmos-emulator.

Starting
This program has encountered a fatal error and cannot continue running at Mon Sep 23 14:34:17 2024
The following diagnostic information is available:

         Reason: Fatal Signal (0x00000001)
         Signal: SIGABRT - Aborted (6)
          Stack:
                 SP               IP               Function
                 ---------------- ---------------- ----------------
                 0000745e715eaea0 000057152d22bbfa <unknown>
                 0000745e715ebf00 000057152d22b5cf <unknown>
                 0000745e715ec160 000057152d22aa61 <unknown>
                 0000745e715ec180 0000745e7d9a9090 killpg+0x40
                 0000745e715ece60 0000745e7d9a900b gsignal+0xcb
                 0000745e715ecf80 0000745e7d988859 abort+0x12b
                 0000745e715ed0b0 000057152d1c3332 <unknown>
                 0000745e715ed160 000057152d245667 <unknown>

        Process: 19 - cosmosdb-emulator
         Thread: 125 (application thread 0x1c4)
    Instance Id: 75f7a966-51f2-49df-9504-e04d0db41969
       Crash Id: b002b7cb-e694-47ff-996a-8df05bd9206d
    Build stamp: (null)
   Distribution: Ubuntu 20.04.6 LTS
     Processors: 8
   Total Memory: 33362911232 bytes
      Timestamp: Mon Sep 23 14:34:17 2024
     Last errno: -34938881
Last errno text: Unknown error -34938881
*********** PAL PANIC CORE DUMP GENERATION FAILED **********
Unable to locate handle-crash.sh. Error: File: signals.cpp:483 [Status: 0xC0000034 Object name not found errno = 0x2(2) No such file or directory]
Executing: /usr/local/bin/cosmos/handle-crash.sh with parameters
     handle-crash.sh
     /usr/local/bin/cosmos/cosmosdb-emulator
     19
     /usr/local/bin/cosmos
     /tmp/cosmos/appdata/log/
     
     75f7a966-51f2-49df-9504-e04d0db41969
     b002b7cb-e694-47ff-996a-8df05bd9206d
     
*********** PANIC CORE DUMP GENERATION FAILED **********
Attempt to launch handle-crash.sh failed.
This program has encountered a fatal error and cannot continue running at Mon Sep 23 14:34:17 2024
The following diagnostic information is available:

         Reason: Host Extension RTL_ASSERT (0x00000003)
         Status: STATUS_WAIT_2 (0x00000002)
        Message: !killTheTarget
          Stack:
                 SP               IP               Function
                 ---------------- ---------------- ----------------
                 0000745e7b9fbdf0 000057152d22bbfa <unknown>
                 0000745e7b9fce50 000057152d22b5cf <unknown>
                 0000745e7b9fd0b0 000057152d195b66 <unknown>
                 0000745e7b9fd0e0 000057152d22f0e4 <unknown>
                 0000745e7b9fe310 000057152d22e7a9 <unknown>
                 0000745e7b9fe440 0000745e7e004609 start_thread+0xd9
                 0000745e7b9fe500 0000745e7da85353 clone+0x43

        Process: 17 - cosmosdb-emulator
         Thread: 18
    Instance Id: 75f7a966-51f2-49df-9504-e04d0db41969
       Crash Id: b002b7cb-e694-47ff-996a-8df05bd9206d
    Build stamp: (null)
   Distribution: Ubuntu 20.04.6 LTS
     Processors: 8
   Total Memory: 33362911232 bytes
      Timestamp: Mon Sep 23 14:34:17 2024
     Last errno: 2
Last errno text: No such file or directory
Aborted (core dumped) ```

jonbaine avatar Sep 23 '24 14:09 jonbaine

For those who still encounter the initial error, just make sure you've pulled the most recent version of the image by explicitly calling docker pull mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest. It helped me.

CPmusiak avatar Sep 25 '24 11:09 CPmusiak

@niteshvijay1995 is there a chance that container.CreateItemAsync(item, partitionKey); method has been affected by the fix? It used to work like a charm before but after the problem with the service startup has been resolved, my bulk update code does not work.

When I send the data in batches, the first batch is sent successfully, but the next one always fails with the following error

                                                                {
                                                                    "name": "Microsoft.Azure.Cosmos.GatewayStoreModel Transport Request",
                                                                    "duration in milliseconds": 14.0089,
                                                                    "data": {
                                                                        "Client Side Request Stats": {
                                                                            "Id": "AggregatedClientSideRequestStatistics",
                                                                            "ContactedReplicas": [],
                                                                            "RegionsContacted": [],
                                                                            "FailedReplicas": [],
                                                                            "AddressResolutionStatistics": [],
                                                                            "StoreResponseStatistics": [],
                                                                            "HttpResponseStats": [
                                                                                {
                                                                                    "StartTimeUTC": "2024-09-25T09:13:26.5463127Z",
                                                                                    "DurationInMs": 11.208,
                                                                                    "RequestUri": "https://172.21.0.11:8081/dbs/clpdb/colls/2bdd3c6e-242f-4cf9-a2b3-ec4b9de4b463/docs",
                                                                                    "ResourceType": "Document",
                                                                                    "HttpMethod": "POST",
                                                                                    "ActivityId": "b0676839-ffde-4499-bb5f-bec335e1ba1f",
                                                                                    "StatusCode": "ServiceUnavailable",
                                                                                    "ReasonPhrase": "Service Unavailable"
                                                                                }
                                                                            ]
                                                                        },
                                                                        "Point Operation Statistics": {
                                                                            "Id": "PointOperationStatistics",
                                                                            "ActivityId": "b0676839-ffde-4499-bb5f-bec335e1ba1f",
                                                                            "ResponseTimeUtc": "2024-09-25T09:13:26.5603912Z",
                                                                            "StatusCode": 503,
                                                                            "SubStatusCode": 20006,
                                                                            "RequestCharge": 0,
                                                                            "RequestUri": "dbs/clpdb/colls/2bdd3c6e-242f-4cf9-a2b3-ec4b9de4b463",
                                                                            "ErrorMessage": "Microsoft.Azure.Documents.DocumentClientException: Channel is closed\r\nActivityId: b0676839-ffde-4499-bb5f-bec335e1ba1f, \r\nRequestStartTime: 2024-09-25T09:13:26.5449333Z, RequestEndTime: 2024-09-25T09:13:26.5531115Z,  Number of regions attempted:1\r\n{\"systemHistory\":[{\"dateUtc\":\"2024-09-25T09:12:23.0246237Z\",\"cpu\":100.000,\"memory\":36080076.000,\"threadInfo\":{\"isThreadStarving\":\"False\",\"threadWaitIntervalInMs\":0.4246,\"availableThreads\":32765,\"minThreads\":20,\"maxThreads\":32767},\"numberOfOpenTcpConnection\":2},{\"dateUtc\":\"2024-09-25T09:12:33.0253040Z\",\"cpu\":100.000,\"memory\":36184024.000,\"threadInfo\":{\"isThreadStarving\":\"False\",\"threadWaitIntervalInMs\":0.2936,\"availableThreads\":32765,\"minThreads\":20,\"maxThreads\":32767},\"numberOfOpenTcpConnection\":2},{\"dateUtc\":\"2024-09-25T09:12:43.0248109Z\",\"cpu\":100.000,\"memory\":36270416.000,\"threadInfo\":{\"isThreadStarving\":\"False\",\"threadWaitIntervalInMs\":0.0806,\"availableThreads\":32765,\"minThreads\":20,\"maxThreads\":32767},\"numberOfOpenTcpConnection\":2},{\"dateUtc\":\"2024-09-25T09:12:53.0243452Z\",\"cpu\":100.000,\"memory\":36090108.000,\"threadInfo\":{\"isThreadStarving\":\"False\",\"threadWaitIntervalInMs\":0.1227,\"availableThreads\":32765,\"minThreads\":20,\"maxThreads\":32767},\"numberOfOpenTcpConnection\":2},{\"dateUtc\":\"2024-09-25T09:13:13.0228358Z\",\"cpu\":100.000,\"memory\":35999584.000,\"threadInfo\":{\"isThreadStarving\":\"False\",\"threadWaitIntervalInMs\":0.1182,\"availableThreads\":32765,\"minThreads\":20,\"maxThreads\":32767},\"numberOfOpenTcpConnection\":2},{\"dateUtc\":\"2024-09-25T09:13:23.0250190Z\",\"cpu\":100.000,\"memory\":35967192.000,\"threadInfo\":{\"isThreadStarving\":\"False\",\"threadWaitIntervalInMs\":0.0499,\"availableThreads\":32765,\"minThreads\":20,\"maxThreads\":32767},\"numberOfOpenTcpConnection\":2}]}\r\nRequestStart: 2024-09-25T09:13:26.5449333Z; ResponseTime: 2024-09-25T09:13:26.5531115Z; StoreResult: StorePhysicalAddress: rntbd://172.21.0.11:10253/apps/DocDbApp/services/DocDbServer0/partitions/a4cb494c-38c8-11e6-8106-8cdcd42c33be/replicas/1p/, LSN: -1, GlobalCommittedLsn: -1, PartitionKeyRangeId: , IsValid: False, StatusCode: 503, SubStatusCode: 20006, RequestCharge: 0, ItemLSN: -1, SessionToken: , UsingLocalLSN: False, TransportException: A client transport error occurred: The connection failed. (Time: 2024-09-25T09:13:26.5531115Z, activity ID: b0676839-ffde-4499-bb5f-bec335e1ba1f, error code: ConnectionBroken [0x0012], base error: HRESULT 0x80131500, URI: rntbd://172.21.0.11:10253/apps/DocDbApp/services/DocDbServer0/partitions/a4cb494c-38c8-11e6-8106-8cdcd42c33be/replicas/1p/, connection: 172.21.0.11:42229 -> 172.21.0.11:10253, payload sent: True), BELatencyMs: , ActivityId: b0676839-ffde-4499-bb5f-bec335e1ba1f, RetryAfterInMs: , ReplicaHealthStatuses: [(port: 10253 | status: Connected | lkt: 09/25/2024 08:49:34)], TransportRequestTimeline: {\"requestTimeline\":[{\"event\": \"Created\", \"startTimeUtc\": \"2024-09-25T09:13:26.5449333Z\", \"durationInMs\": 0.0105},{\"event\": \"ChannelAcquisitionStarted\", \"startTimeUtc\": \"2024-09-25T09:13:26.5449438Z\", \"durationInMs\": 7.3619},{\"event\": \"Pipelined\", \"startTimeUtc\": \"2024-09-25T09:13:26.5523057Z\", \"durationInMs\": 0.1566},{\"event\": \"Transit Time\", \"startTimeUtc\": \"2024-09-25T09:13:26.5524623Z\", \"durationInMs\": 0.5794},{\"event\": \"Failed\", \"startTimeUtc\": \"2024-09-25T09:13:26.5530417Z\", \"durationInMs\": 0}],\"serviceEndpointStats\":{\"inflightRequests\":1,\"openConnections\":1},\"connectionStats\":{\"waitforConnectionInit\":\"True\",\"callsPendingReceive\":0,\"lastSendAttempt\":\"2024-09-25T09:13:26.5531115Z\",\"lastSend\":\"2024-09-25T09:13:26.5531115Z\",\"lastReceive\":\"2024-09-25T09:13:26.5531115Z\"},\"requestSizeInBytes\":19633,\"requestBodySizeInBytes\":19158};\r\n ResourceType: Document, OperationType: Batch\r\n, Microsoft.Azure.Documents.Common/2.14.0, Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, Linux/12 cosmos-netstandard-sdk/3.31.5\n   at Microsoft.Azure.Cosmos.GatewayStoreClient.ParseResponseAsync(HttpResponseMessage responseMessage, JsonSerializerSettings serializerSettings, DocumentServiceRequest request)\n   at Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeAsync(DocumentServiceRequest request, ResourceType resourceType, Uri physicalAddress, CancellationToken cancellationToken)\n   at Microsoft.Azure.Cosmos.GatewayStoreModel.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken)\n   at Microsoft.Azure.Cosmos.GatewayStoreModel.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken)\n   at Microsoft.Azure.Cosmos.Handlers.TransportHandler.ProcessMessageAsync(RequestMessage request, CancellationToken cancellationToken)\n   at Microsoft.Azure.Cosmos.Handlers.TransportHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)",
                                                                            "RequestSessionToken": null,
                                                                            "ResponseSessionToken": null,
                                                                            "BELatencyInMs": null
                                                                        }
                                                                    }
                                                                }

CPmusiak avatar Sep 25 '24 11:09 CPmusiak

Yes, the fix for latest linux kernel and this error is related. We are looking into it and will soon provide an update.

niteshvijay1995 avatar Sep 25 '24 11:09 niteshvijay1995

For those who still encounter the initial error, just make sure you've pulled the most recent version of the image by explicitly calling docker pull mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest. It helped me.

For the avoidance of doubt and not to hold up any possible fix, I have pulled latest cosmos. You can tell because:

  • There is encouraging log output before it dies (which it didnt used to get to) and the trail period has been extended again:
2024-09-19T19:09:51.568792510Z This is an evaluation version.  There are [179] days left in the evaluation period.
2024-09-19T19:09:56.517131099Z 2.14.20.0 (728f9251)
2024-09-19T19:09:56.517289324Z Copyright (C) Microsoft Corporation. All rights reserved.
2024-09-19T19:10:00.636639872Z Starting
  • The logs now correctly identify the 22.x distro (Distribution: Ubuntu 22.04.5 LTS)
  • The error is different from before (Reason: OS RTL_ASSERT (0x00000004))

Hoping @niteshvijay1995 @sajeetharan are aware and looking into this please.

adamzest avatar Sep 25 '24 11:09 adamzest

@niteshvijay1995 is there a chance that container.CreateItemAsync(item, partitionKey); method has been affected by the fix? It used to work like a charm before but after the problem with the service startup has been resolved, my bulk update code does not work.

@CPmusiak We have fixed this issue in latest version. 2.14.20.0 (dd7750b6)

niteshvijay1995 avatar Sep 27 '24 11:09 niteshvijay1995

For those who still encounter the initial error, just make sure you've pulled the most recent version of the image by explicitly calling docker pull mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest. It helped me.

For the avoidance of doubt and not to hold up any possible fix, I have pulled latest cosmos. You can tell because:

  • There is encouraging log output before it dies (which it didnt used to get to) and the trail period has been extended again:
2024-09-19T19:09:51.568792510Z This is an evaluation version.  There are [179] days left in the evaluation period.
2024-09-19T19:09:56.517131099Z 2.14.20.0 (728f9251)
2024-09-19T19:09:56.517289324Z Copyright (C) Microsoft Corporation. All rights reserved.
2024-09-19T19:10:00.636639872Z Starting
  • The logs now correctly identify the 22.x distro (Distribution: Ubuntu 22.04.5 LTS)
  • The error is different from before (Reason: OS RTL_ASSERT (0x00000004))

Hoping @niteshvijay1995 @sajeetharan are aware and looking into this please.

@adamzest Thanks for sharing the dump file. We are looking into this and will update you soon with a resolution.

niteshvijay1995 avatar Sep 27 '24 11:09 niteshvijay1995