azure-cosmos-db-emulator-docker icon indicating copy to clipboard operation
azure-cosmos-db-emulator-docker copied to clipboard

Cosmos DB Linux Emulator fails to start on some Intel chips

Open milismsft opened this issue 3 years ago • 58 comments

Related to: https://github.com/actions/virtual-environments/issues/5036#issuecomment-1044270895

The Cosmos DB Linux Emulator fails to start on some Intel chips.

lscpu output: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit

Byte Order: Little Endian Address sizes: [46] CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz Stepping: 7 CPU MHz: 2593.907 BogoMIPS: 87.81 Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 64KiB L1i cache: 64 KiB L2 cache: 2 MiB L3 cache: 35.8 MiB NUMA node0 CPU(s): 0,1 Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Vulnerability Meltdown: Mitigation; PTI Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Full generic retpoline, STIBP disabled, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear /proc/cpuinfo content: /proc/cpuinfo

MicrosoftTeams-image (2)

milismsft avatar Feb 18 '22 18:02 milismsft

Is there a way to add a constraint on the Azure Pipeline to use the CPU model that works? I am hitting this issue in Azure DevOps Pipelines, and I always get model 85, which always fails. I have tried specifying "ubuntu-latest" "ubuntu-20.04" and "ubuntu-18.04" but none have worked.

The below CPU also fails.

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
stepping	: 7
microcode	: 0xffffffff
cpu MHz		: 2593.905
cache size	: 36608 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 5187.81
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

This one with ubuntu-18.04 did work:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
stepping	: 4
microcode	: 0xffffffff
cpu MHz		: 2095.078
cache size	: 36608 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 4190.15
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

aressler38 avatar Apr 26 '22 17:04 aressler38

I am also facing this issue. Are there any work arounds or has any progress been made?

rrr-michael-aquilina avatar May 30 '22 12:05 rrr-michael-aquilina

@rrr-michael-aquilina The workaround that worked for me was moving to the Cosmos emulator (powershell) that's baked in the windows pipeline.

High traffic times can cause the emulator to start slowly, > 5min. I had to make some modifications to it's timeout and such but it's been pretty stable since.

Definitely worth having than not using the emulator at all in the DevOps pipeline.

soenneker avatar Jun 09 '22 16:06 soenneker

Also faced this issue using the Linux Docker image. Cost me a day of investigating network issues just to find the out the container is immediately shutting down. Using ubuntu-18.04 as suggested in the other Github ticket worked for me, but a fix for 20-04 would be great.

christian-be avatar Jul 22 '22 05:07 christian-be

Ubuntu agent 18.04 is getting depracated so the issue needs to be fixed before that day. https://github.com/actions/runner-images/issues/6002

waszak avatar Aug 09 '22 09:08 waszak

We are seeing the exact same problem. Running fine on Azure DevOps agents running ubuntu-18.04 but fails on ubuntu-20.04 and ubuntu-22.04. Would someone please look in to a fix on this, as the ubuntu-18.04 image are beeing deprecated on 4/1/2023 as @waszak mentioned: https://github.com/actions/runner-images/issues/6002

mj-rittermann avatar Sep 09 '22 06:09 mj-rittermann

I run a test today and it works on 20.04, see https://github.com/eddumelendez/testcontainers-cosmodb-gha-test/actions/runs/3153248862/jobs/5129495371

Can someone else confirm?

eddumelendez avatar Sep 30 '22 00:09 eddumelendez

@eddumelendez Not yet. I'm using Cosmos DB container as Service container on GitHub Actions, but "Connection Refused" error still occurs.

https://github.com/ddradar/ddradar/pull/1002 https://github.com/ddradar/ddradar/actions/runs/3155822253/jobs/5134896830

nogic1008 avatar Sep 30 '22 02:09 nogic1008

I think it is flaky, ran two more times and the first failed but the last one succeeded

eddumelendez avatar Sep 30 '22 02:09 eddumelendez

Yes it's flaky. I continue to see random failures as well.

mmoayyed avatar Sep 30 '22 07:09 mmoayyed

@milismsft do you have any updates on this issue?

waszak avatar Oct 04 '22 14:10 waszak

+1 for working on ubuntu 20/22

We run as part of integration testing - only starts (sometimes) on ubuntu 18. But anything higher it just hangs at the "Starting" message in the container logs forever.

Our devs use docker-compose stack for local dependencies which includes cosmosdb, so we would like to just spin up the same stack in ado pipelines.

Ubuntu 18 deprecation date was pushed back to April '23 so we have a bit more time...

dankarmyy avatar Oct 12 '22 14:10 dankarmyy

Very actual during the current un-scheduled brownout for 18! 20 doesn't work.

LevYas avatar Oct 14 '22 14:10 LevYas

This repo doesn't look like is active so I posted question here.

https://learn.microsoft.com/en-us/answers/questions/1057083/cosmos-db-linux-emulator-doesn39t-work-on-some-int.html

waszak avatar Oct 25 '22 13:10 waszak

Any news on this?

Meandron avatar Dec 07 '22 14:12 Meandron

Someone asked again today but all we got is the same answer.

We don't have a public facing ETA we can share for now, but we will share on Azure updates when this will be available.

waszak avatar Jan 26 '23 22:01 waszak

while we wait on this, is there a workaround ? I am using windows agent to get around this problem but the emulator for windows agent randomly takes too long to start

asos-gurpreetsingh avatar Feb 15 '23 22:02 asos-gurpreetsingh

while we wait on this, is there a workaround ? I am using windows agent to get around this problem but the emulator for windows agent randomly takes too long to start

Utilize a retry with your PowerShell task

soenneker avatar Feb 15 '23 23:02 soenneker

Since today the next scheduled brown-out of the Ubuntu 18.04 GHA runners happened and we are getting closer to EOL for those runners, any updates or workarounds, especially for GHA users?

kiview avatar Feb 21 '23 16:02 kiview

We are blocked on this issue too. Any update?

eli-fin avatar Mar 20 '23 10:03 eli-fin

We are blocked on this issue too. Any update?

Our current workaround is, to self-host agents with a different chipset. See my answer here: #56

DSpirit avatar Mar 20 '23 10:03 DSpirit

We are blocked on this issue too. Any update?

Our current workaround is, to self-host agents with a different chipset. See my answer here: #56

Our org has strict policies regarding self-hosted agents, so not as straight forward. But thanks.

eli-fin avatar Mar 20 '23 10:03 eli-fin

We are blocked on this issue too. Any update?

Our current workaround is, to self-host agents with a different chipset. See my answer here: #56

Our org has strict policies regarding self-hosted agents, so not as straight forward. But thanks.

We moved this job to a windows agent and rest are on ubuntu to get around this issue.

asos-gurpreetsingh avatar Mar 20 '23 10:03 asos-gurpreetsingh

We moved this job to a windows agent and rest are on ubuntu to get around this issue.

OK. How can your code running on ubuntu access the db running on the windows agent?

eli-fin avatar Mar 20 '23 10:03 eli-fin

@milismsft any update here? Ubuntu 18.04 isn't available anymore so it essentially prevents us from using Linux agents..

soenneker avatar May 21 '23 03:05 soenneker

any news?

aomegax avatar Aug 23 '23 13:08 aomegax

Any updates on this?

guibranco avatar Oct 06 '23 15:10 guibranco

Hi it is not supported yet, but we are actively exploring options to support this.

sajeetharan avatar Oct 12 '23 09:10 sajeetharan

This is blocking our project from running Integration Tests on github with CosmosDB, so I hope this will be fixed soon. No update since Oct 12 is not particularly encouraging. Since this involves a very basic use case for two flagship products, I would hope this would get prompt attention.

tpischke avatar Dec 01 '23 08:12 tpischke

@sajeetharan will the new version you mentioned in #79 here also fix this issue? (we're trying to use the emulator as part of our integration tests in an Azure Devops Pipeline)

razvangoga avatar Dec 04 '23 20:12 razvangoga