edac-utils icon indicating copy to clipboard operation
edac-utils copied to clipboard

Add labels for Intel Corporation S1200RP Server Board Family

Open paulmenzel opened this issue 10 years ago • 9 comments

The server board family is used by the German Web hosting provider and data center operator Hetzner in their PX60-SSD offer [1].

According to the Intel Server Board S1200V3RP Technical Product Specification [1], the board has two memory channels and four memory DIMMs.

The four slots are displayed vertically and are labeled in the order below.

DIMM_A2, DIMM_A1, DIMM_B2, DIMM_B1

The output of DMI table decoder dmidecode confirms this.

$ sudo dmidecode -t Memory | grep Locator
    Locator: ChannelA-DIMM1
    Bank Locator: BANK 1
    Locator: ChannelA-DIMM2
    Bank Locator: BANK 0
    Locator: ChannelB-DIMM1
    Bank Locator: BANK 3
    Locator: ChannelB-DIMM2
    Bank Locator: BANK 2

PS: Support for the Intel E3-1200 series DRAM controller was added in Linux 3.17 in commit 7ee40b89 (ie31200_edac: Introduce the driver) [1].

[1] https://www.hetzner.de [2] http://download.intel.com/support/motherboards/server/sb/g84364004_s1200v3rp_tps_r2_0.pdf Intel reference number G84364-004, Revision 2.0, June 2015 [3] http://git.kernel.org/linus/7ee40b897d18ab03111eda9a6a0550e98166eada

paulmenzel avatar Jun 11 '15 17:06 paulmenzel

  1. I am not sure what row exactly means. For now I treated them as banks.
  2. Please comment on the formatting of the labels and if each channel should be put on one line.

paulmenzel avatar Jun 11 '15 17:06 paulmenzel

I amended the patch to fix a typo in DIMM_B1.

paulmenzel avatar Jun 11 '15 17:06 paulmenzel

Below is the output, which looks wrong.

$ sudo edac-ctl --print-labels
LOCATION                            CONFIGURED LABEL     SYSFS CONTENTS      
mc0/csrow1/ch0_dimm_label           DIMM_A2              mc#0csrow#1channel#0
mc0/csrow2/ch0_dimm_label           DIMM_A1              mc#0csrow#2channel#0
mc0/csrow3/ch1_dimm_label           DIMM_B2              mc#0csrow#3channel#1
mc0/csrow4/ch1_dimm_label           DIMM_B1              Missing

paulmenzel avatar Jun 11 '15 17:06 paulmenzel

I started the row numbering with 0. No idea if it’s correct now.

$ sudo edac-ctl --print-labels
LOCATION                            CONFIGURED LABEL     SYSFS CONTENTS      
mc0/csrow0/ch0_dimm_label           DIMM_A2              mc#0csrow#0channel#0
mc0/csrow1/ch0_dimm_label           DIMM_A1              mc#0csrow#1channel#0
mc0/csrow2/ch1_dimm_label           DIMM_B2              mc#0csrow#2channel#1
mc0/csrow3/ch1_dimm_label           DIMM_B1              mc#0csrow#3channel#1

paulmenzel avatar Jun 11 '15 17:06 paulmenzel

<mc>.<row>.<channel> in the labels file corresponds directly to mcX/csrowY/ch[0,1]* in /sys/devices/system/edac/mc

What does something like

ls /sys/devices/system/edac/mc/*/csrow*

Show on this system?

grondo avatar Jun 11 '15 17:06 grondo

Typically, DIMM labels are verified by moving a bad DIMM between slots on the MB. If supported, you can sometimes move a single DIMM between slots and see which csrow/channel is populated under sysfs.

Unfortunately, the MC/csrow/channel mapping for DIMMs is getting less and less applicable for new architectures. I think upstream is working on a new edac interface, and at that time probably edac-utils will no longer be needed

grondo avatar Jun 11 '15 17:06 grondo

@grondo, on the board the following is shown under /sys.

/sys/devices/system/edac/mc/mc0$ ls csrow*
csrow0:
ce_count  ch0_ce_count  ch0_dimm_label  ch1_ce_count  ch1_dimm_label  dev_type  edac_mode  mem_type  power  size_mb  subsystem  ue_count  uevent

csrow1:
ce_count  ch0_ce_count  ch0_dimm_label  ch1_ce_count  ch1_dimm_label  dev_type  edac_mode  mem_type  power  size_mb  subsystem  ue_count  uevent

csrow2:
ce_count  ch0_ce_count  ch0_dimm_label  ch1_ce_count  ch1_dimm_label  dev_type  edac_mode  mem_type  power  size_mb  subsystem  ue_count  uevent

csrow3:
ce_count  ch0_ce_count  ch0_dimm_label  ch1_ce_count  ch1_dimm_label  dev_type  edac_mode  mem_type  power  size_mb  subsystem  ue_count  uevent

paulmenzel avatar Jun 11 '15 17:06 paulmenzel

Looks like the driver registered two channels for each csrow. You may want to populate those labels and move around a bad dimm to see which labels correspond to each channel within the csrow.

grondo avatar Jun 11 '15 17:06 grondo

Unfortunately I have no physical access to the board. I’ll try to contact Hetzner to get that information. Maybe even Intel is the right contact partner.

paulmenzel avatar Jun 11 '15 17:06 paulmenzel