bladebit icon indicating copy to clipboard operation
bladebit copied to clipboard

Bladebit 1.2.2 closing without errors - Windows Server 2019

Open potatoFive opened this issue 3 years ago • 10 comments

I'm trying to get bladebit running on an R720 with 768GB of RAM and Windows server 2019. When I run bladebit it shows allocating buffers for ~10 seconds then just closes with no errors. I was hoping someone might be able to point me in the right direction.

Here's what I'm seeing:

D:\Chia_Plotter\bladebit-v1.2.2-windows-x86-64>bladebit.exe -t 36 -n 1 -v -c xch[TRUNCATED] -f 948[TRUNCATED] F:\Chia
Creating 1 plots:
Output path : F:\Chia
Thread count : 36
Warm start enabled : false
Farmer public key : 948[TRUNCATED]
Pool contract address : xch[TRUNCATED]
System Memory: 543/767 GiB.
Memory required: 416 GiB.
Allocating buffers.
D:\Chia_Plotter\bladebit-v1.2.2-windows-x86-64>

potatoFive avatar Nov 10 '21 02:11 potatoFive

Thanks for moving the issue over here.

Can you see if you can find any relevant log appearing in the Windows event viewer? You can find out by opening searching "Event Viewer" on the start menu and opening the Event Viewer application. Then, under the "Windows Logs" folder, check under "Application" and see if you find one that's Bladebit related.

Otherwise, you might try restarting the machine and retrying, since it's crashing while it's allocating its buffers, it could be that it is somehow not showing the error (it should) when it can't allocate the buffer due to page fragmentation.

harold-b avatar Nov 10 '21 03:11 harold-b

I tried restarting earlier, but after the restart I was seeing exactly the same behaviour.

I've run Bladebit several times in the last 30 min. in an attempt to generate an event log entry. Nothing new is showing up under system or application under Windows Logs.

I can run Baldebit using Windows Subsystem for Linux, but plot times were terrible. ~70min per plot with 40 threads on dual 10 core v2 Xeons so I'm hoping to get the Windows binary running.

I've also disabled DEP not that I would expect that to change anything, and tried running with elevated privileges.

potatoFive avatar Nov 10 '21 04:11 potatoFive

I tried running Bladebit through the Chia GUI and this did generate a logged error.

Creating 1 plots:
 Output path           : F:\Chia
 Thread count          : 20
 Warm start enabled    : false
 Farmer public key     : 948[TRUNCATED]
 Pool contract address : xch[TRUNCATED]

System Memory: 671/767 GiB.
Memory required: 416 GiB.
Allocating buffers.
STDERR: Fatal Error:
STDERR:   Error: Failed to allocate required buffers.

Here's the Windows Event Log entry

- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="Application Error" /> 
  <EventID Qualifiers="0">1000</EventID> 
  <Level>2</Level> 
  <Task>100</Task> 
  <Keywords>0x80000000000000</Keywords> 
  <TimeCreated SystemTime="2021-11-10T05:10:29.189095100Z" /> 
  <EventRecordID>46361</EventRecordID> 
  <Channel>Application</Channel> 
  <Computer>R720.domain.local</Computer> 
  <Security /> 
  </System>
- <EventData>
  <Data>bladebit.exe</Data> 
  <Data>0.0.0.0</Data> 
  <Data>6171c0df</Data> 
  <Data>ntdll.dll</Data> 
  <Data>10.0.17763.2145</Data> 
  <Data>a211e4d0</Data> 
  <Data>c0000374</Data> 
  <Data>00000000000fa979</Data> 
  <Data>3dd4</Data> 
  <Data>01d7d5f1323236bc</Data> 
  <Data>C:\Users\gcogar\AppData\Local\chia-blockchain\app-1.2.11\resources\app.asar.unpacked\daemon\bladebit\bladebit.exe</Data> 
  <Data>C:\Windows\SYSTEM32\ntdll.dll</Data> 
  <Data>a353cb91-eadc-4b6e-94b9-364ef0d8ad7e</Data> 
  <Data /> 
  <Data /> 
  </EventData>
  </Event>

potatoFive avatar Nov 10 '21 05:11 potatoFive

Thanks following up with more data. So it is indeed failing during calls to VirtualAlloc. I'll set up a different branch to output the error being generated during this call to see if it helps diagnose. My initial assumption is that there is a system configuration that perhaps doesn't let you allocate that much memory in the system, or perhaps across nodes.

harold-b avatar Nov 10 '21 17:11 harold-b

Thanks, I really appreciate you taking the time to look into this.

potatoFive avatar Nov 10 '21 17:11 potatoFive

I have the same problem.

_Creating 1 plots: Output path : Q:\Q1 Thread count : 40 Warm start enabled : false Farmer public key : 8d4*** Pool contract address : xch1***

System Memory: 439/447 GiB. Memory required: 416 GiB. Allocating buffers._

It seems there is something wrong when allocate buffers.

System OS: windows server 2019 standard CPU: Intel® Xeon® Silver 4214R @2.40GHz 2.40GHz (2 CPUs)

NotThree avatar Nov 11 '21 02:11 NotThree

Bladebit version: 1.2.4 System Memory: 448 GiB DDR3 1866MHz. System OS: Win 10 Professional Workstation Disk: HS-SSD-C2000Pro 1024G CPU:Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz 2.40 GHz (2 CPUs)

optional arguments: Warm start enabled : false disabled numa enabled : false no cpu affinity enabled : false

Here's what I'm seeing:

Creating 1 plots:
Output path           : F:\plot
Thread count          : 40
Warm start enabled    : false
Farmer public key : 94a***
Pool contract address : xch1***

optional arguments: Warm start enabled : false disabled numa enabled : true no cpu affinity enabled : true

They all fail at Phase 3

Here's what I'm seeing:

Creating 1 plots:
 Output path           : F:\plot
 Thread count          : 40
 Warm start enabled    : false
 Farmer public key     : 94a***
 Pool contract address : xch1***

System Memory: 440/447 GiB.
Memory required: 416 GiB.
Allocating buffers.
Generating plot 1 / 1: 53c5fe139eb852f5477d55a35b2c42cb5a1c779370eb253bd65a92582f6fecbd

Running Phase 1
Generating F1...
Finished F1 generation in 9.04 seconds.
Sorting F1...
Finished F1 sort in 40.65 seconds.
Progress update: 0.01
Forward propagating to table 2...
  Pairing L/R groups...
  Finished pairing L/R groups in 21.0100 seconds. Created 4294967296 pairs.
  Average of 236.1406 pairs per group.
  Computing Fx...
  Finished computing Fx in 21.7710 seconds.
  Sorting entries...
  Finished sorting in 85.25 seconds.
Finished forward propagating table 2 in 129.04 seconds.
Progress update: 0.06
Forward propagating to table 3...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.9670 seconds. Created 4294960572 pairs.
  Average of 236.1403 pairs per group.
  Computing Fx...
  Finished computing Fx in 22.3690 seconds.
  Sorting entries...
  Finished sorting in 84.61 seconds.
Finished forward propagating table 3 in 126.95 seconds.
Progress update: 0.12
Forward propagating to table 4...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.7410 seconds. Created 4294967296 pairs.
  Average of 236.1406 pairs per group.
  Computing Fx...
  Finished computing Fx in 22.6930 seconds.
  Sorting entries...
  Finished sorting in 83.22 seconds.
Finished forward propagating table 4 in 125.67 seconds.
Progress update: 0.2
Forward propagating to table 5...
  Pairing L/R groups...
  Finished pairing L/R groups in 19.2460 seconds. Created 4294967296 pairs.
  Average of 236.1406 pairs per group.
  Computing Fx...
  Finished computing Fx in 22.8360 seconds.
  Sorting entries...
  Finished sorting in 84.09 seconds.
Finished forward propagating table 5 in 127.18 seconds.
Progress update: 0.28
Forward propagating to table 6...
  Pairing L/R groups...
  Finished pairing L/R groups in 19.0510 seconds. Created 4294904884 pairs.
  Average of 236.1372 pairs per group.
  Computing Fx...
  Finished computing Fx in 22.4020 seconds.
  Sorting entries...
  Finished sorting in 82.25 seconds.
Finished forward propagating table 6 in 124.73 seconds.
Progress update: 0.36
Forward propagating to table 7...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.8150 seconds. Created 4294906176 pairs.
  Average of 236.1373 pairs per group.
  Computing Fx...
  Finished computing Fx in 20.8320 seconds.
Finished forward propagating table 7 in 40.67 seconds.
Progress update: 0.42
Finished Phase 1 in 723.96 seconds.
Running Phase 2
  Prunning table 6...
  Finished prunning table 6 in 1.13 seconds.
Progress update: 0.43
  Prunning table 5...
  Finished prunning table 5 in 39.85 seconds.
Progress update: 0.48
  Prunning table 4...
  Finished prunning table 4 in 38.12 seconds.
Progress update: 0.51
  Prunning table 3...
  Finished prunning table 3 in 37.37 seconds.
Progress update: 0.55
  Prunning table 2...
  Finished prunning table 2 in 37.08 seconds.
Progress update: 0.58
Finished Phase 2 in 154.74 seconds.
Running Phase 3
  Compressing tables 1 and 2...
  Finished compressing tables 1 and 2 in 89.93 seconds
Progress update: 0.66
  Table 1 now has 3429392911 / 4294967296 entries ( 79.85% ).
  Compressing tables 2 and 3...
STDERR: Fatal Error:


STDERR:   Failed to write table 2 to disk.

optional arguments: Warm start enabled : true disabled numa enabled : true no cpu affinity enabled : true

Starts with 3 successes and then all failures They all fail at Phase 3 or Phase 4

Here's what I'm seeing:

Creating 1 plots:
 Output path           : F:\plot
 Thread count          : 40
 Warm start enabled    : false
 Farmer public key     : 94a***
 Pool contract address : xch1***

System Memory: 440/447 GiB.
Memory required: 416 GiB.
Allocating buffers.
Generating plot 1 / 1: d5ad85c3e68294fa53db15299064562d0ae9061c7b28ed00dc941d85672f04f2

Running Phase 1
Generating F1...
Finished F1 generation in 6.43 seconds.
Sorting F1...
Finished F1 sort in 36.66 seconds.
Progress update: 0.01
Forward propagating to table 2...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.4430 seconds. Created 4294967296 pairs.
  Average of 236.1406 pairs per group.
  Computing Fx...
  Finished computing Fx in 20.8640 seconds.
  Sorting entries...
  Finished sorting in 83.28 seconds.
Finished forward propagating table 2 in 123.63 seconds.
Progress update: 0.06
Forward propagating to table 3...
  Pairing L/R groups...
  Finished pairing L/R groups in 17.9460 seconds. Created 4294967296 pairs.
  Average of 236.1406 pairs per group.
  Computing Fx...
  Finished computing Fx in 23.6910 seconds.
  Sorting entries...
  Finished sorting in 84.00 seconds.
Finished forward propagating table 3 in 126.68 seconds.
Progress update: 0.12
Forward propagating to table 4...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.2090 seconds. Created 4294823561 pairs.
  Average of 236.1327 pairs per group.
  Computing Fx...
  Finished computing Fx in 44.7500 seconds.
  Sorting entries...
  Finished sorting in 85.91 seconds.
Finished forward propagating table 4 in 149.92 seconds.
Progress update: 0.2
Forward propagating to table 5...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.0790 seconds. Created 4294777197 pairs.
  Average of 236.1302 pairs per group.
  Computing Fx...
  Finished computing Fx in 43.3470 seconds.
  Sorting entries...
  Finished sorting in 85.24 seconds.
Finished forward propagating table 5 in 147.75 seconds.
Progress update: 0.28
Forward propagating to table 6...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.3930 seconds. Created 4294735528 pairs.
  Average of 236.1279 pairs per group.
  Computing Fx...
  Finished computing Fx in 22.1680 seconds.
  Sorting entries...
  Finished sorting in 82.45 seconds.
Finished forward propagating table 6 in 124.09 seconds.
Progress update: 0.36
Forward propagating to table 7...
  Pairing L/R groups...
  Finished pairing L/R groups in 17.5810 seconds. Created 4294430377 pairs.
  Average of 236.1111 pairs per group.
  Computing Fx...
  Finished computing Fx in 21.8740 seconds.
Finished forward propagating table 7 in 40.50 seconds.
Progress update: 0.42
Finished Phase 1 in 755.66 seconds.
Running Phase 2
  Prunning table 6...
  Finished prunning table 6 in 0.84 seconds.
Progress update: 0.43
  Prunning table 5...
  Finished prunning table 5 in 38.88 seconds.
Progress update: 0.48
  Prunning table 4...
  Finished prunning table 4 in 37.08 seconds.
Progress update: 0.51
  Prunning table 3...
  Finished prunning table 3 in 36.38 seconds.
Progress update: 0.55
  Prunning table 2...
  Finished prunning table 2 in 36.02 seconds.
Progress update: 0.58
Finished Phase 2 in 150.37 seconds.
Running Phase 3
  Compressing tables 1 and 2...
  Finished compressing tables 1 and 2 in 92.87 seconds
Progress update: 0.66
  Table 1 now has 3429307458 / 4294967296 entries ( 79.84% ).
  Compressing tables 2 and 3...
  Finished compressing tables 2 and 3 in 93.71 seconds
Progress update: 0.73
  Table 2 now has 3439856394 / 4294967296 entries ( 80.09% ).
  Compressing tables 3 and 4...
  Finished compressing tables 3 and 4 in 93.12 seconds
Progress update: 0.79
  Table 3 now has 3466009513 / 4294823561 entries ( 80.70% ).
  Compressing tables 4 and 5...
  Finished compressing tables 4 and 5 in 95.72 seconds
Progress update: 0.85
  Table 4 now has 3532817670 / 4294777197 entries ( 82.26% ).
  Compressing tables 5 and 6...
STDERR: Fatal Error:


STDERR:   Failed to write table 5 to disk.

or

Creating 1 plots:
 Output path           : F:\plot
 Thread count          : 40
 Warm start enabled    : true
 Farmer public key     : 94a***
 Pool contract address : xch1***

System Memory: 440/447 GiB.
Memory required: 416 GiB.
Allocating buffers.
Generating plot 1 / 1: 6d70941b654d1df2ed9fc50205f7525b2f1709b0ee43bb3fa62e90f4dd50216a

Running Phase 1
Generating F1...
Finished F1 generation in 6.31 seconds.
Sorting F1...
Finished F1 sort in 36.83 seconds.
Progress update: 0.01
Forward propagating to table 2...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.9310 seconds. Created 4294653697 pairs.
  Average of 236.1234 pairs per group.
  Computing Fx...
  Finished computing Fx in 21.5730 seconds.
  Sorting entries...
  Finished sorting in 80.82 seconds.
Finished forward propagating table 2 in 122.43 seconds.
Progress update: 0.06
Forward propagating to table 3...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.6820 seconds. Created 4294282094 pairs.
  Average of 236.1030 pairs per group.
  Computing Fx...
  Finished computing Fx in 22.1750 seconds.
  Sorting entries...
  Finished sorting in 82.98 seconds.
Finished forward propagating table 3 in 124.96 seconds.
Progress update: 0.12
Forward propagating to table 4...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.4420 seconds. Created 4293666537 pairs.
  Average of 236.0691 pairs per group.
  Computing Fx...
  Finished computing Fx in 52.3360 seconds.
  Sorting entries...
  Finished sorting in 79.81 seconds.
Finished forward propagating table 4 in 151.68 seconds.
Progress update: 0.2
Forward propagating to table 5...
  Pairing L/R groups...
  Finished pairing L/R groups in 17.7820 seconds. Created 4292349760 pairs.
  Average of 235.9967 pairs per group.
  Computing Fx...
  Finished computing Fx in 55.3540 seconds.
  Sorting entries...
  Finished sorting in 75.29 seconds.
Finished forward propagating table 5 in 149.55 seconds.
Progress update: 0.28
Forward propagating to table 6...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.0950 seconds. Created 4289764653 pairs.
  Average of 235.8546 pairs per group.
  Computing Fx...
  Finished computing Fx in 23.3120 seconds.
  Sorting entries...
  Finished sorting in 66.71 seconds.
Finished forward propagating table 6 in 109.23 seconds.
Progress update: 0.36
Forward propagating to table 7...
  Pairing L/R groups...
  Finished pairing L/R groups in 18.1270 seconds. Created 4284649654 pairs.
  Average of 235.5734 pairs per group.
  Computing Fx...
  Finished computing Fx in 21.4910 seconds.
Finished forward propagating table 7 in 40.73 seconds.
Progress update: 0.42
Finished Phase 1 in 741.73 seconds.
Running Phase 2
  Prunning table 6...
  Finished prunning table 6 in 0.83 seconds.
Progress update: 0.43
  Prunning table 5...
  Finished prunning table 5 in 40.42 seconds.
Progress update: 0.48
  Prunning table 4...
  Finished prunning table 4 in 38.22 seconds.
Progress update: 0.51
  Prunning table 3...
  Finished prunning table 3 in 37.43 seconds.
Progress update: 0.55
  Prunning table 2...
  Finished prunning table 2 in 37.16 seconds.
Progress update: 0.58
Finished Phase 2 in 155.29 seconds.
Running Phase 3
  Compressing tables 1 and 2...
  Finished compressing tables 1 and 2 in 89.84 seconds
Progress update: 0.66
  Table 1 now has 3428787068 / 4294653697 entries ( 79.84% ).
  Compressing tables 2 and 3...
  Finished compressing tables 2 and 3 in 91.56 seconds
Progress update: 0.73
  Table 2 now has 3438786506 / 4294282094 entries ( 80.08% ).
  Compressing tables 3 and 4...
  Finished compressing tables 3 and 4 in 92.95 seconds
Progress update: 0.79
  Table 3 now has 3464203398 / 4293666537 entries ( 80.68% ).
  Compressing tables 4 and 5...
  Finished compressing tables 4 and 5 in 92.09 seconds
Progress update: 0.85
  Table 4 now has 3529596785 / 4292349760 entries ( 82.23% ).
  Compressing tables 5 and 6...
  Finished compressing tables 5 and 6 in 98.53 seconds
Progress update: 0.92
  Table 5 now has 3707816390 / 4289764653 entries ( 86.43% ).
  Compressing tables 6 and 7...
  Finished compressing tables 6 and 7 in 112.18 seconds
Progress update: 0.98
  Table 6 now has 4284649654 / 4284649654 entries ( 100.00% ).
Finished Phase 3 in 577.15 seconds.
Running Phase 4
  Writing P7.
  Finished writing P7 in 1.25 seconds.
  Writing C1 table.
  Finished writing C1 table in 0.00 seconds.
  Writing C2 table.
  Finished writing C2 table in 0.00 seconds.
  Writing C3 table.
  Finished writing C3 table in 0.92 seconds.
Finished Phase 4 in 2.17 seconds.
Writing final plot tables to disk

wehnhew avatar Nov 19 '21 01:11 wehnhew

截屏2021-11-21 上午10 48 11 I have the same no-error exits with Linux(Untuntu20.04)/WSL/WinSRV2019/WinSRV2022.

I think the OS cached too many RAM space after you moved a lot of plots to other HDDs/SSDs or finished lots of plots(>50) with bladebit and the OS cann't release it.

So, a system reboot needed before starting a new bladebit task.

AND the -w is very important ;-)

VanVanWang avatar Nov 21 '21 02:11 VanVanWang

Adding a datapoint:

Relevant specs: 2xE5-2697 v2 512G (256G per NUMA node) 16x32G DDR3 LRDIMMs running at 1866 MT/s. OS is Windows Server 2019

  • Started bladebit via Chia GUI release 1.3.1. First run was fine, and generated a k32 plot.

  • Second run -> no dice. Process crashed almost immediately, with a few lines added to the log.

  • Rebooted the server -> tried to start bladebit via the Chia GUI again -> no dice && same result.

  • Downloaded latest binary from the repo. Ran it the same way Chia GUI would have executed it. -> Same result: process crash + entry in event log.

  • git-cloned the source code and compiled on the plotter. Tried to run it -> no dice / same result as above attempts.

  • git-cloned to an Ubuntu 20.04 WSL1 env and tried again. Different result:

  1. initial memory reservation/alloc felt much slower than on the one Windows native run.
  2. I was able to create a plot. Was able to start the plotter a second time, after the first run. No issues there, no reboot necessary, and it was churning through plots fine with the "-w" and "-n" switches.
  3. plot creation time in WSL1 was significantly faster than using the native Windows env. (The system was running some other low res tasks, but they would have been fairly consistent between the initial plotting process and the subsequent WSL tests)

Windows / Chia GUI: Finished writing tables to disk in 30.48 seconds. Finished plotting in 1638.91 seconds (27.32 minutes).

WSL1 example: Finished writing tables to disk in 51.50 seconds. Finished plotting in 1372.12 seconds (22.87 minutes).

centrd avatar Mar 31 '22 06:03 centrd

Some additional info:

Here is an example of an event log entry I see after trying to initiate bladebit from Chia GUI 1.3.1:

Faulting application name: bad_module_info, version: 0.0.0.0, time stamp: 0x00000000 Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000 Exception code: 0xc0000005 Fault offset: 0x00007ffbe3e83620 Faulting process id: 0x7338 Faulting application start time: 0x01d844c1235c03ca Faulting application path: bad_module_info Faulting module path: unknown Report Id: [redacted] Faulting package full name: Faulting package-relative application ID:

centrd avatar Mar 31 '22 07:03 centrd

Any Update ? Same with bb 2.0

Maladon0815 avatar Feb 26 '23 09:02 Maladon0815

I still have the same issue, however it's worth noting that GPU plotting with the alpha works great. Someone else messaged me awhile ago and the conclusion we came to is that this is likely a dell bios specific issue.

Nonemu avatar Feb 27 '23 02:02 Nonemu

Still habe the issue with cudaploter...After allocating buffer nothing happens (Dell 720,winserver 2019)

Maladon0815 avatar Feb 27 '23 06:02 Maladon0815

known issues in win 10 and windows 2019, workaround right now is use win 11

jmhands avatar Aug 03 '23 00:08 jmhands