ord
ord copied to clipboard
Ord index terminating every few thousand blocks
Reported by @VirtuallyThere:
ord index is terminating every few thousands - ten thousands blocks for no apparent reason and without error message - so I need to run it in a loop from the shell.... So far had to restart 71 times for indexing from genesis block to block 665500
This data is from running 325201f0840ba4e7113ce5ea8ef840c6e5f89a49
In total, required 98 runs of the index command. Total time to run was from 2022-11-15T21:51:35Z to 2022-11-18T15:32:06Z so roughly 64 hours. Time and block height of stoppages attached. issue-813.txt
Running with latest master from today - interestingly the first block heights where it terminates are identical, but not the third one:
index-1668801599.log
[2022-11-18T20:37:53Z INFO ord::index::updater] Committing at block height 365034, 300237 outputs traversed, 253718 in map, 168704319 cached
index-1668803877.log
[2022-11-18T20:41:44Z INFO ord::index::updater] Committing at block height 370068, 47832 outputs traversed, 32856 in map, 9528618 cached
index-1668804106.log
[2022-11-18T20:44:22Z INFO ord::index::updater] Committing at block height 375069, 13904422 outputs traversed, 2794084 in map, 11110342 cached
Second run with latest master from Friday (fa7999524f986f421b5c86a3f454e8ef933ba8d5) finished. Command I am using:
export RUST_LOG=info ; while true ; do target/release/ord --cookie-file /opt/node/bitcoin/data/.cookie --data-dir data index 2>&1 | tee data/index-$(date +%s).log ; done
In total, 86 runs of the index command. Total time to run was from 2022-11-18T20:37:53Z to 2022-11-20T01:38:11Z so roughly 31 hours.
Time and block height of stoppages attached.
It's a JSON-RPC error, somehow our retry does not work:
[2022-11-20T16:12:43Z ERROR ord::index::updater] Failed to fetch block 255034: JSON-RPC error: transport error: Couldn't connect to host: Connection reset by peer (os error 54)
Turns out this is just https://github.com/casey/ord/issues/747 and we should probably resurrect https://github.com/casey/ord/pull/748
This is no longer a problem, retrying in #820 means that we continue even if we get an error. However, this is still happening, and the pattern is so regular that I think we must be doing something weird:
[2022-11-23T01:32:49Z ERROR ord::index::updater] failed to fetch block 505034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T01:44:27Z ERROR ord::index::updater] failed to fetch block 510034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T01:53:09Z ERROR ord::index::updater] failed to fetch block 515034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T01:58:12Z ERROR ord::index::updater] failed to fetch block 520034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T02:03:21Z ERROR ord::index::updater] failed to fetch block 525034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T02:09:05Z ERROR ord::index::updater] failed to fetch block 530034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T02:15:03Z ERROR ord::index::updater] failed to fetch block 535034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T02:19:44Z ERROR ord::index::updater] failed to fetch block 540034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T02:24:18Z ERROR ord::index::updater] failed to fetch block 545034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T02:38:54Z ERROR ord::index::updater] failed to fetch block 555034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T02:47:13Z ERROR ord::index::updater] failed to fetch block 560034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T02:54:43Z ERROR ord::index::updater] failed to fetch block 565034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T03:01:49Z ERROR ord::index::updater] failed to fetch block 570034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T03:10:07Z ERROR ord::index::updater] failed to fetch block 575034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T03:17:05Z ERROR ord::index::updater] failed to fetch block 580034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T03:26:25Z ERROR ord::index::updater] failed to fetch block 585034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[2022-11-23T03:36:03Z ERROR ord::index::updater] failed to fetch block 590034, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
It starts around block 365034, and from there we get an error every time block_number == x * 5000 + 34
. The updater writes to the database every 5000 blocks, but the 34 is a mystery.
This is low priority, since it's not actually causing any problems, so pushing it back to the production inscriptions milestone.
The updater writes to the database every 5000 blocks, but the 34 is a mystery.
Maybe, not so much of a mystery. The MPSC channel that the background block fetcher uses to send blocks to the main thread can queue up 32 blocks.
- Updater flushes to database
- Meanwhile, 32 blocks are fetched and sent into the channel, and the block fetcher blocks when it tries to insert the 33rd
- Updater finishes flushing
- Something bad happens
- Request for block fails
My guess is that this is related to the 15 second timeout in jsonrpc::SimpleHttpTransport
. Once flushing to the database starts taking long enough, it starts hitting the timeout.
FYI I'm seeing this on my linux system on latest master.
I suspect latest master is particularly bad here, because of #1166, so flushes are taking a lot longer, so we're more likely to hit the timeout.
I face this issue with 0.4.2
on macOS.
Below is sample logs:
$ ord --first-inscription-height 767429 server
Listening on http://0.0.0.0:80
[indexing blocks] ██████████████████████████████████████████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 565000/774595[2023-02-01T16:00:28Z ERROR ord::index::updater] failed to fetch block 565033, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[indexing blocks] ███████████████████████████████████████████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 570000/774595[2023-02-01T16:08:03Z ERROR ord::index::updater] failed to fetch block 570033, retrying in 2s: JSON-RPC error: transport error: HTTP response too short: length 0, needed 12.
[indexing blocks] ████████████████████████████████████████████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 575000/774595[2023-02-01T16:18:47Z ERROR ord::index::updater] failed to fetch block 575033, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: Broken pipe (os error 32)
[indexing blocks] ████████████████████████████████████████████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 578298/774595[2023-02-01T16:25:07Z ERROR ord::index::updater] failed to fetch block 578331, retrying in 2s: JSON-RPC error: transport error: HTTP response too short: length 0, needed 12.
[indexing blocks] █████████████████████████████████████████████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 580000/774595
Then I broke and run the server, yet again.
My mac with Xeon processor and ECC RAM crashed while indexing blocks and being stuck at 580000 for about 18 minutes. Below is the panic log:
macpanic(cpu 10 caller 0xffffff800dcd7941): userspace watchdog timeout: no successful checkins from logd in 120 seconds
service returned not alive with context : unresponsive dispatch queue(s): com.apple.firehose.io-wl
logd has not exited since first loaded
service: logd, total successful checkins in 131117 seconds: 13097, last successful checkin: 120 seconds ago
service: WindowServer, total successful checkins in 131081 seconds: 13094, last successful checkin: 120 seconds ago
service: remoted, total successful checkins in 131117 seconds: 13110, last successful checkin: 0 seconds ago
service: opendirectoryd, total successful checkins in 131117 seconds: 13111, last successful checkin: 0 seconds ago
service: configd, total successful checkins in 131117 seconds: 13111, last successful checkin: 0 seconds ago
Panicked task 0xffffff907e6849d8: 4 threads: pid 114: watchdogd
Backtrace (CPU 10), panicked thread: 0xffffff8bb2514b30, Frame : Return Address
0xfffffff586627590 : 0xffffff800a9eb31d
0xfffffff5866275e0 : 0xffffff800ab59236
0xfffffff586627620 : 0xffffff800ab48480
0xfffffff586627670 : 0xffffff800a985951
0xfffffff586627690 : 0xffffff800a9eb5fd
0xfffffff586627780 : 0xffffff800a9eaca9
0xfffffff5866277e0 : 0xffffff800b1e0993
0xfffffff5866278d0 : 0xffffff800dcd7941
0xfffffff5866278e0 : 0xffffff800dcd75e0
0xfffffff586627900 : 0xffffff800dcd66c9
0xfffffff586627a30 : 0xffffff800b153f0a
0xfffffff586627b90 : 0xffffff800aaffe89
0xfffffff586627ca0 : 0xffffff800a9c2c69
0xfffffff586627d50 : 0xffffff800a9dbb62
0xfffffff586627dc0 : 0xffffff800a9dc26a
0xfffffff586627ef0 : 0xffffff800ab2aa7a
0xfffffff586627fa0 : 0xffffff800a985db6
Kernel Extensions in backtrace:
com.apple.driver.watchdog(1.0)[92D3D6F6-4641-3F89-B20E-AE9C7FA0BA06]@0xffffff800dcd5000->0xffffff800dcd7fff
Process name corresponding to current thread (0xffffff8bb2514b30): watchdogd
Mac OS version:
22C65
Kernel version:
Darwin Kernel Version 22.2.0: Fri Nov 11 02:08:47 PST 2022; root:xnu-8792.61.2~4/RELEASE_X86_64
Kernel UUID: 6B3374E9-7FD7-354D-886F-E995FAAC2CA3
roots installed: 0
KernelCache slide: 0x000000000a600000
KernelCache base: 0xffffff800a800000
Kernel slide: 0x000000000a6dc000
Kernel text base: 0xffffff800a8dc000
__HIB text base: 0xffffff800a700000
System model name: iMacPro1,1 (Mac-7BA5B2D9E42DDD94)
System shutdown begun: NO
Hibernation exit count: 0
System uptime in nanoseconds: 131117753756239
Last Sleep: absolute base_tsc base_nano
Uptime : 0x000077403b22ed28
Sleep : 0x0000000000000000 0x0000000000000000 0x0000000000000000
Wake : 0x0000000000000000 0x0005fbea02a1d000 0x0000000000000000
Compressor Info: 2% of compressed pages limit (OK) and 4% of segments limit (OK) with 2 swapfiles and OK swap space
Zone info:
Zone map: 0xffffff807d024000 - 0xffffffa07d024000
. PGZ : 0xffffff807d024000 - 0xffffff8081025000
. VM : 0xffffff8081025000 - 0xffffff854d358000
. RO : 0xffffff854d358000 - 0xffffff86e69be000
. GEN0 : 0xffffff86e69be000 - 0xffffff8bb2cf1000
. GEN1 : 0xffffff8bb2cf1000 - 0xffffff907f024000
. GEN2 : 0xffffff907f024000 - 0xffffff954b357000
. GEN3 : 0xffffff954b357000 - 0xffffff9a1768a000
. DATA : 0xffffff9a1768a000 - 0xffffffa07d024000
Metadata: 0xffffff8031014000 - 0xffffff8051014000
Bitmaps : 0xffffff8051014000 - 0xffffff805d014000
last started kext at 125095566278442: >usb.cdc.acm 5.0.0 (addr 0xffffff7fa0545000, size 12288)
last stopped kext at 125747247159303: >usb.serial 6.0.0 (addr 0xffffff7fa05c4000, size 20480)
loaded kexts:
@macos.driver.!UEthernetHost 8.1.1
>AudioAUUC 1.70
>usb.!UHostBillboardDevice 1.0
@filesystems.cd9660 1.4.4
@filesystems.smbfs 5.0
>!ATopCaseHIDEventDriver 6000.22
>AGPM 131
>X86PlatformShim 1.0.0
>!APlatformEnabler 2.7.0d0
@filesystems.autofs 3.0
>!AHIDALSService 1
>!AUpstreamUserClient 3.6.9
@kext.AMDFramebuffer 4.0.9
@kext.AMDRadeonX5000 4.0.9
@kext.AMDRadeonServiceManager 4.0.9
>!AGFXHDA 200.2
>!ABridgeAudio!C 300.10
>!AGraphicsDevicePolicy 7.1.18
@AGDCPluginDisplayMetrics 7.1.18
>pmtelemetry 1
@filesystems.nfs 1
|IOUserEthernet 1.0.1
>usb.!UUserHCI 1
>!AHV 1
>!ADiskImages2 198.40.3
>!A!IMCEReporter 115
@kext.AMD10000!C 4.0.9
>BridgeAudioCommunication 300.10
>!AMCCSControl 1.16
>!AAVEBridge 6.1
>!A!IPCHPMC 2.0.1
>!AThunderboltIP 4.0.3
>!A!ISlowAdaptiveClocking 4.0.0
|SCSITaskUserClient 476
>!UCardReader 547
@!AEthernetAquantiaAqtionFirmware 1.0.36
>!AFileSystemDriver 3.0.1
@filesystems.tmpfs 1
>BCMWLANFirmware4387.Hashstore 1
>BCMWLANFirmware4378.Hashstore 1
>BCMWLANFirmware4377.Hashstore 1
>BCMWLANFirmware4364.Hashstore 1
>BCMWLANFirmware4355.Hashstore 1
@filesystems.lifs 1
@filesystems.hfs.kext 627.40.1
@BootCache 40
@!AFSCompression.!AFSCompressionTypeZlib 1.0.0
@!AFSCompression.!AFSCompressionTypeDataless 1.0.0d1
@filesystems.apfs 2142.61.2
>!ABCMWLANBusInterfacePCIeMac 1
@private.KextAudit 1.0
>!AACPIButtons 6.1
>!ASMBIOS 2.1
>!AACPIEC 6.1
>!AAPIC 1.7
$!AUserConsent 1
@!ASystemPolicy 2.0.0
@nke.applicationfirewall 403
|IOKitRegistryCompatibility 1
|EndpointSecurity 1
@Dont_Steal_Mac_OS_X 7.0.0
@kec.Compression 1
@kec.!AEncryptedArchive 1
>!AHIDKeyboard 231
>!AActuatorDriver 6400.44
>!AMultitouchDriver 6400.44
>!AInputDeviceSupport 6400.40
>!AHS!BDriver 6000.22
>IO!BHIDDriver 9.0.0
>!AAudioClockLibs 200.5
@kext.AMDRadeonX5100HWLibs 1.0
|IOAccelerator!F2 475
@kext.AMDRadeonX5000HWServices 4.0.9
>usb.IOUSBHostHIDDevice 1.2
|IOAudio!F 400.3
@vecLib.kext 1.2.0
@kext.triggers 1.0
>IOHIDPowerSource 1
|IO!BSerialManager 9.0.0
|IO!BPacketLogger 9.0.0
|IO!BHost!CUSBTransport 9.0.0
|IO!BHost!CUARTTransport 9.0.0
|IO!BHost!CTransport 9.0.0
>IO!BHost!CPCIeTransport 9.0.0
|IOAVB!F 1120.2
|CSR!BHost!CUSBTransport 9.0.0
|Broadcom!BHost!CUSBTransport 9.0.0
|Broadcom!B20703USBTransport 9.0.0
>!ARSMChannel 1
|IORSM!F 1
>!AIPAppender 1.0
>!A!ILpssUARTv1 3.0.60
>!A!ILpssUARTCommon 3.0.60
>!AOnboardSerial 1.0
>!AGraphicsControl 7.1.18
|IONDRVSupport 597
>!UMergeNub 900.4.2
@kext.AMDSupport 4.0.9
>!ASMBus!C 1.0.18d1
@!AGPUWrangler 7.1.18
@!AGraphicsDeviceControl 7.1.18
|IOGraphics!F 597
@plugin.IOgPTPPlugin 1110.13
>driverkit.serial 6.0.0
>!AThunderboltDPOutAdapter 8.5.1
>X86PlatformPlugin 1.0.0
>IOPlatformPlugin!F 6.0.0d8
|IOSlowAdaptiveClocking!F 1.0.0
>usb.cdc.ecm 5.0.0
>usb.cdc.ncm 5.0.0
>usb.cdc 5.0.0
>usb.networking 5.0.0
>usb.!UHostCompositeDevice 1.2
>!AThunderboltDPInAdapter 8.5.1
>!AThunderboltDPAdapter!F 8.5.1
>!AThunderboltPCIDownAdapter 4.1.1
>!AHPM 3.4.4
>!A!ILpssI2C!C 3.0.60
>!A!ILpssI2C 3.0.60
>!A!ILpssDmac 3.0.60
>!ABSDKextStarter 3
|IOSurface 334.0.1
@filesystems.hfs.encodings.kext 1
>!ASyntheticGame!C 10.3.5
>!AThunderboltNHI 7.2.81
|IOThunderbolt!F 9.3.3
>!AXsanScheme 3
>!AEthernetAquantiaAqtion 1.0.64
>!ABCMWLANCoreMac 1.0.0
|IO80211!F 1200.13.0
>IOImageLoader 1.0.0
>!AOLYHALMac 1
|IOSerial!F 11
>corecapture 1.0.4
>usb.!UVHCIBCE 1.2
>usb.!UVHCICommonBCE 1.0
>usb.!UVHCI 1.2
>usb.!UVHCICommon 1.0
>!AEffaceableNOR 1.0
|IOBufferCopy!C 1.1.0
|IOBufferCopyEngine!F 1
|IONVMe!F 2.1.0
>usb.!UHostPacketFilter 1.0
|IOUSB!F 900.4.2
>usb.!UXHCIPCI 1.2
>usb.!UXHCI 1.2
>!AEFINVRAM 2.1
>!ASMCRTC 1.0
>!AEFIRuntime 2.1
|IOSMBus!F 1.1
|IOHID!F 2.0.0
|IOTimeSync!F 1110.13
|IOSkywalk!F 1.0
>mDNSOffloadUserClient 1.0.1b8
|IONetworking!F 3.4
>DiskImages 493.0.0
|IO!B!F 9.0.0
|IOReport!F 47
$quarantine 4
$sandbox 300.0
@kext.!AMatch 1.0.0d1
>!ASSE 1.0
>!AKeyStore 2
>!UTDM 547
|IOUSBMass!SDriver 232
|IOSCSIBlockCommandsDevice 476
|IO!S!F 2.1
|IOSCSIArchitectureModel!F 476
>!AMobileFileIntegrity 1.0.5
$!AImage4 5.0.0
@kext.CoreTrust 1
>!AFDEKeyStore 28.30
>!AEffaceable!S 1.0
>!ACredentialManager 1.0
|CoreAnalytics!F 1
>KernelRelayHost 1
|IOUSBHost!F 1.2
>!UHostMergeProperties 1.2
>usb.!UCommon 1.0
>!ABusPower!C 1.0
>!ASEPManager 1.0.1
>IOSlaveProcessor 1
>!AACPIPlatform 6.1
>!ASMC 3.1.9
|IOPCI!F 2.9
|IOACPI!F 1.4
>watchdog 1
@kec.pthread 1
@kec.Libm 1
@kec.corecrypto 12.0
Until that point, the following blocks hanged for less than 30 seconds each: 577934/774682 578195/774682 578402/774682 578620/774682 579121/774682 579975/774682
After the crash, the indexing started from ~577K again.
I've ord 0.4.2 running on macOS Ventura.
i am getting similar issues on Mac OSX 12.6.3, Mac Pro 7,1 tower (2019 cheese grater), plenty of horsepower & 250GB SSD internal space left.
When using ord index
it gets to 659956 and it just stops doing anything, left for hours, nothing...
many rpc errors on the way:
[2023-02-02T05:28:09Z ERROR ord::index::updater] failed to fetch block 659819, retrying in 2s: JSON-RPC error: transport error: HTTP response too short: length 0, needed 12.
I have been trying for a week, incredibley frustrating. "ord index" restarts from 6555000 and continues to hang at 659956. repeats the same thing, have repeated 20+ times...
Have tried disconnect bitcoind and restarting, also bitcoin reindex (took forever but work successfully), running out of options...
I have no idea what to do here... incredibly frustrating, appreciate any advice...
Crashed with a panic again. 0.4.2 won't fool me again. Moving on to build from the master branch.
10f8469a09f601d2ae4939a2d74b0f607782a26b is the first bad commit
10f8469 is the first bad commit
This definitely seems like a bad bisect. That commit is super harmless.
Yes, that's a bad bisect. I meant to write that to https://github.com/casey/ord/issues/1467. Please ignore.
seeing similar behavior on AWS hosted windows machine. was smooth sailing until block 335,000. after that getting the below error consistently every 5000 blocks
[2023-02-07T06:04:59Z ERROR ord::index::updater] failed to fetch block 400033, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: An established connection was aborted by the software in your host machine. (os error 10053)
seeing similar behavior on AWS hosted windows machine. was smooth sailing until block 335,000. after that getting the below error consistently every 5000 blocks
[2023-02-07T06:04:59Z ERROR ord::index::updater] failed to fetch block 400033, retrying in 2s: JSON-RPC error: transport error: Couldn't connect to host: An established connection was aborted by the software in your host machine. (os error 10053)
Index still running 13 hours later with the same behavior (error every 5000 blocks) on the ☝🏾 system.
I started this process on a locally hosted machine and got the error much earlier (block 29400, block 122629, block 129636). No discernible pattern on when the error is occurring on this system. Hope it helps!
Should be fixed by https://github.com/casey/ord/pull/1516.
Haven't seen this issue come up since February.
Still seeing this issue: Command: ord --bitcoin-data-dir=D:\Bitcoin --index-sats index run Issue: Index failing every 5000 block exactly. Error: error: JSON-RPC error: transport error: Couldn't connect to host: An established connection was aborted by the software in your host machine. (os error 10053) Ord version: 0.9.0
I am suffering it now. Every 5000 blocks , it comes up with rocks error.
I am suffering it now. Every 5000 blocks , it comes up with rocks error.
I'll log a new issue and ref this one then. @DrJingLee
maybe you should try to download here https://github.com/yanowo/ORD-index.redb-Share
I stopped reindex for the last whole week and just use the new version of index.redb.