edac-utils
edac-utils copied to clipboard
Add labels for Intel Corporation S1200RP Server Board Family
The server board family is used by the German Web hosting provider and data center operator Hetzner in their PX60-SSD offer [1].
According to the Intel Server Board S1200V3RP Technical Product Specification [1], the board has two memory channels and four memory DIMMs.
The four slots are displayed vertically and are labeled in the order below.
DIMM_A2, DIMM_A1, DIMM_B2, DIMM_B1
The output of DMI table decoder dmidecode confirms this.
$ sudo dmidecode -t Memory | grep Locator
Locator: ChannelA-DIMM1
Bank Locator: BANK 1
Locator: ChannelA-DIMM2
Bank Locator: BANK 0
Locator: ChannelB-DIMM1
Bank Locator: BANK 3
Locator: ChannelB-DIMM2
Bank Locator: BANK 2
PS: Support for the Intel E3-1200 series DRAM controller was added in Linux 3.17 in commit 7ee40b89 (ie31200_edac: Introduce the driver) [1].
[1] https://www.hetzner.de [2] http://download.intel.com/support/motherboards/server/sb/g84364004_s1200v3rp_tps_r2_0.pdf Intel reference number G84364-004, Revision 2.0, June 2015 [3] http://git.kernel.org/linus/7ee40b897d18ab03111eda9a6a0550e98166eada
- I am not sure what row exactly means. For now I treated them as banks.
- Please comment on the formatting of the labels and if each channel should be put on one line.
I amended the patch to fix a typo in DIMM_B1.
Below is the output, which looks wrong.
$ sudo edac-ctl --print-labels
LOCATION CONFIGURED LABEL SYSFS CONTENTS
mc0/csrow1/ch0_dimm_label DIMM_A2 mc#0csrow#1channel#0
mc0/csrow2/ch0_dimm_label DIMM_A1 mc#0csrow#2channel#0
mc0/csrow3/ch1_dimm_label DIMM_B2 mc#0csrow#3channel#1
mc0/csrow4/ch1_dimm_label DIMM_B1 Missing
I started the row numbering with 0. No idea if it’s correct now.
$ sudo edac-ctl --print-labels
LOCATION CONFIGURED LABEL SYSFS CONTENTS
mc0/csrow0/ch0_dimm_label DIMM_A2 mc#0csrow#0channel#0
mc0/csrow1/ch0_dimm_label DIMM_A1 mc#0csrow#1channel#0
mc0/csrow2/ch1_dimm_label DIMM_B2 mc#0csrow#2channel#1
mc0/csrow3/ch1_dimm_label DIMM_B1 mc#0csrow#3channel#1
<mc>.<row>.<channel> in the labels file corresponds directly to mcX/csrowY/ch[0,1]* in /sys/devices/system/edac/mc
What does something like
ls /sys/devices/system/edac/mc/*/csrow*
Show on this system?
Typically, DIMM labels are verified by moving a bad DIMM between slots on the MB. If supported, you can sometimes move a single DIMM between slots and see which csrow/channel is populated under sysfs.
Unfortunately, the MC/csrow/channel mapping for DIMMs is getting less and less applicable for new architectures. I think upstream is working on a new edac interface, and at that time probably edac-utils will no longer be needed
@grondo, on the board the following is shown under /sys.
/sys/devices/system/edac/mc/mc0$ ls csrow*
csrow0:
ce_count ch0_ce_count ch0_dimm_label ch1_ce_count ch1_dimm_label dev_type edac_mode mem_type power size_mb subsystem ue_count uevent
csrow1:
ce_count ch0_ce_count ch0_dimm_label ch1_ce_count ch1_dimm_label dev_type edac_mode mem_type power size_mb subsystem ue_count uevent
csrow2:
ce_count ch0_ce_count ch0_dimm_label ch1_ce_count ch1_dimm_label dev_type edac_mode mem_type power size_mb subsystem ue_count uevent
csrow3:
ce_count ch0_ce_count ch0_dimm_label ch1_ce_count ch1_dimm_label dev_type edac_mode mem_type power size_mb subsystem ue_count uevent
Looks like the driver registered two channels for each csrow. You may want to populate those labels and move around a bad dimm to see which labels correspond to each channel within the csrow.
Unfortunately I have no physical access to the board. I’ll try to contact Hetzner to get that information. Maybe even Intel is the right contact partner.