Updating your BIOS firmware

The BIOS firmware that shipped with my Sapphire FS-FP5V motherboard could not enable ECC.

However, I recently got in contact with Sapphire in order to obtain their latest BIOS (BIOS v22).

In my previous post, I had installed unbuffered ECC SODIMMs (KTL-TN424E/16G) on the motherboard.

With the installation of the new BIOS firmware, I began to test whether ECC was properly enabled.

Does my operating system detect the enabled ECC?

Booting into Ubuntu 18.04 LTS was not enough, I had to compile the latest stable Linux kernel for the more recent EDAC amd64 updates (at the time, it was 4.17.13).

When I booted into Linux kernel 4.17.13, I began to check the DMI tables for ECC detection.

embedded@embedded-pc:~$ sudo dmidecode --type memory
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 3.1.1 present.
# SMBIOS implementations newer than version 3.0 are not
# fully supported by this version of dmidecode.

Handle 0x0009, DMI type 16, 23 bytes
Physical Memory Array
	Location: System Board Or Motherboard
	Use: System Memory
	Error Correction Type: Multi-bit ECC
	Maximum Capacity: 32 GB
	Error Information Handle: 0x0008
	Number Of Devices: 2

Handle 0x0010, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x0009
	Error Information Handle: 0x000F
	Total Width: 128 bits
	Data Width: 64 bits
	Size: 16384 MB
	Form Factor: SODIMM
	Set: None
	Locator: DIMM 0
	Bank Locator: P0 CHANNEL A
	Type: DDR4
	Type Detail: Synchronous Unbuffered (Unregistered)
	Speed: 1200 MHz
	Manufacturer: Kingston
	Serial Number: 5734E1CB
	Asset Tag: Not Specified
	Part Number: 9965657-005.A00G    
	Rank: 2
	Configured Clock Speed: 1200 MHz
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V

Handle 0x0013, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x0009
	Error Information Handle: 0x0012
	Total Width: Unknown
	Data Width: Unknown
	Size: No Module Installed
	Form Factor: Unknown
	Set: None
	Locator: DIMM 0
	Bank Locator: P0 CHANNEL B
	Type: Unknown
	Type Detail: Unknown
	Speed: Unknown
	Manufacturer: Unknown
	Serial Number: Unknown
	Asset Tag: Not Specified
	Part Number: Unknown
	Rank: Unknown
	Configured Clock Speed: Unknown
	Minimum Voltage: Unknown
	Maximum Voltage: Unknown
	Configured Voltage: Unknown

Notice that the “Physical Memory Array” lists “Multi-bit ECC” rather than “None”.

However, I came across problems when I tried to check how many correctable/uncorrectable errors that the motherboard detected.

embedded@embedded-pc:~$ edac-util
edac-util: Error: No memory controller data found.

This error indicates that ECC is not properly enabled in Linux. The output of edac-util is not much to work with, so I referred to the kernel messages.

embedded@embedded-pc:~$ dmesg | grep “EDAC”
[    0.073383] EDAC MC: Ver: 3.0.0
[    0.073383] EDAC DEBUG: edac_mc_sysfs_init: device mc created
[    2.900785] EDAC amd64: Node 0: DRAM ECC enabled.
[    2.900786] EDAC amd64: F17h detected (node 0).
[    2.900796] EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?)
[    2.900797] EDAC amd64: Error: Error probing instance: 0
[    2.943600] EDAC amd64: Node 0: DRAM ECC enabled.
[    2.943604] EDAC amd64: F17h detected (node 0).
[    2.943614] EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?)
[    2.943655] EDAC amd64: Error: Error probing instance: 0
[    2.960538] EDAC amd64: Node 0: DRAM ECC enabled.
[    2.960540] EDAC amd64: F17h detected (node 0).
[    2.960550] EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?)
[    2.960589] EDAC amd64: Error: Error probing instance: 0
[    2.982500] EDAC amd64: Node 0: DRAM ECC enabled.
[    2.982503] EDAC amd64: F17h detected (node 0).
[    2.982512] EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?)
[    2.982550] EDAC amd64: Error: Error probing instance: 0
[    3.001432] EDAC amd64: Node 0: DRAM ECC enabled.
[    3.001434] EDAC amd64: F17h detected (node 0).
[    3.001443] EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?)
[    3.001475] EDAC amd64: Error: Error probing instance: 0
[    3.019564] EDAC amd64: Node 0: DRAM ECC enabled.
[    3.019566] EDAC amd64: F17h detected (node 0).
[    3.019578] EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?)
[    3.019621] EDAC amd64: Error: Error probing instance: 0
[    3.042528] EDAC amd64: Node 0: DRAM ECC enabled.
[    3.042530] EDAC amd64: F17h detected (node 0).
[    3.042540] EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?)
[    3.042578] EDAC amd64: Error: Error probing instance: 0

Here, the EDAC (which is responsible for setting memory scrub rates and other ECC error decoding features) kernel module encounters errors when enabling ECC.

Upon close inspection, the kernel error messages state (broken BIOS?).

These errors appear regardless of whether you have both DIMM slots occupied.

I contacted Sapphire for their input, and they informed me that ECC would be working properly if it was seated in P0 CHANNEL A alone or both slots.

At this point, the purchase of unbuffered ECC SODIMMs seemed wasteful.

However, ECC should be officially supported as stated in the product brief for the AMD Ryzen Embedded V1000 Processor Family.

Debugging the EDAC kernel module errors

Looking back at the error EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?), I was able to locate the section of code it came from.

That error was generated from reserve_mc_sibling_devs() in linux-4.17.13/drivers/edac/amd64_edac.c.

/*
 * Use pvt->F3 which contains the F3 CPU PCI device to get the related
 * F1 (AddrMap) and F2 (Dct) devices. Return negative value on error.
 * Reserve F0 and F6 on systems with a UMC.
 */
static int
reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2)
{
	if (pvt->umc) {
		pvt->F0 = pci_get_related_function(pvt->F3->vendor, pci_id1, pvt->F3);
		if (!pvt->F0) {
			amd64_err("F0 not found, device 0x%x (broken BIOS?)\n", pci_id1);
			return -ENODEV;
		}

		pvt->F6 = pci_get_related_function(pvt->F3->vendor, pci_id2, pvt->F3);
		if (!pvt->F6) {
			pci_dev_put(pvt->F0);
			pvt->F0 = NULL;

			amd64_err("F6 not found: device 0x%x (broken BIOS?)\n", pci_id2);
			return -ENODEV;
		}

		edac_dbg(1, "F0: %s\n", pci_name(pvt->F0));
		edac_dbg(1, "F3: %s\n", pci_name(pvt->F3));
		edac_dbg(1, "F6: %s\n", pci_name(pvt->F6));

		return 0;
	}

	/* Reserve the ADDRESS MAP Device */
	pvt->F1 = pci_get_related_function(pvt->F3->vendor, pci_id1, pvt->F3);
	if (!pvt->F1) {
		amd64_err("F1 not found: device 0x%x (broken BIOS?)\n", pci_id1);
		return -ENODEV;
	}

	/* Reserve the DCT Device */
	pvt->F2 = pci_get_related_function(pvt->F3->vendor, pci_id2, pvt->F3);
	if (!pvt->F2) {
		pci_dev_put(pvt->F1);
		pvt->F1 = NULL;

		amd64_err("F2 not found: device 0x%x (broken BIOS?)\n", pci_id2);
		return -ENODEV;
	}

	edac_dbg(1, "F1: %s\n", pci_name(pvt->F1));
	edac_dbg(1, "F2: %s\n", pci_name(pvt->F2));
	edac_dbg(1, "F3: %s\n", pci_name(pvt->F3));

	return 0;
}

The comments explain that device F0 and F6 should be related to an unified memory controller (UMC).

Since, the AMD Ryzen Embedded V1000 Processor Family is listed under the Zen microarchitecture (AMD family 17h).

We can find where the device ids F0 and F6 are set in linux-4.17.13/drivers/edac/amd64_edac.c.

	[F17_CPUS] = {
		.ctl_name = "F17h",
		.f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0,
		.f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
		.ops = {
			.early_channel_count	= f17_early_channel_count,
			.dbam_to_cs		= f17_base_addr_to_cs_size,
		}
	},

In this particular section of the code, we find PCI_DEVICE_ID_AMD_17H_DF_F0 and PCI_DEVICE_ID_AMD_17H_DF_F6.

These variables are not defined within the file ‘amd64_edac.c’; however, they are found in linux-4.17.13/drivers/edac/amd64_edac.h.

/*
 * PCI-defined configuration space registers
 */
#define PCI_DEVICE_ID_AMD_15H_NB_F1	0x1601
#define PCI_DEVICE_ID_AMD_15H_NB_F2	0x1602
#define PCI_DEVICE_ID_AMD_15H_M30H_NB_F1 0x141b
#define PCI_DEVICE_ID_AMD_15H_M30H_NB_F2 0x141c
#define PCI_DEVICE_ID_AMD_15H_M60H_NB_F1 0x1571
#define PCI_DEVICE_ID_AMD_15H_M60H_NB_F2 0x1572
#define PCI_DEVICE_ID_AMD_16H_NB_F1	0x1531
#define PCI_DEVICE_ID_AMD_16H_NB_F2	0x1532
#define PCI_DEVICE_ID_AMD_16H_M30H_NB_F1 0x1581
#define PCI_DEVICE_ID_AMD_16H_M30H_NB_F2 0x1582
#define PCI_DEVICE_ID_AMD_17H_DF_F0	0x1460
#define PCI_DEVICE_ID_AMD_17H_DF_F6	0x1466

At last, the definitions for our F0 and F6 device ids represented by 16-bit unsigned integers in hexadecimal notation.

Here, I realized that both the register number and device ids were enumerated.

I can also use lspci to output my motherboard’s device ids.

embedded@embedded-pc:~$ lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d0]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d1]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d3]
00:01.6 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d3]
00:01.7 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d3]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15db]
00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15dc]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15e8]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15e9]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15ea]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15eb]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15ec]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15ed]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15ee]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15ef]
01:00.0 Non-Volatile memory controller [0108]: Kingston Technologies Device [2646:5008] (rev 01)
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c)
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c)
04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega [Radeon Vega 8 Mobile] [1002:15dd] (rev 81)
04:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:15de]
04:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:15df]
04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15e0]
04:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15e1]
04:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] Device [1022:15e2]
04:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Device [1022:15e3]
04:00.7 Non-VGA unclassified device [0000]: Advanced Micro Devices, Inc. [AMD] Device [1022:15e6]
05:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 61)

After careful inspection, I confirmed that both [1022:1460] and [1022:1466] do not exist in the terminal output.

The only type of device that enumerated PCI device ids was the Host bridge.

00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15e8]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15e9]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15ea]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15eb]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15ec]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15ed]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15ee]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:15ef]

Now I added PCI device id definitions for F17h M11h (AMD codename Great Horned Owl).

In order to get this processor family to detect these new definitions, I had to add an entry to enum amd_families {}.

enum amd_families {
	K8_CPUS = 0,
	F10_CPUS,
	F15_CPUS,
	F15_M30H_CPUS,
	F15_M60H_CPUS,
	F16_CPUS,
	F16_M30H_CPUS,
	F17_CPUS,
	F17_M11H_CPUS,
	NUM_FAMILIES,
};

I separated the F17h_M11h device ids from the F17h device ids.

	[F17_CPUS] = {
		.ctl_name = "F17h",
		.f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0,
		.f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
		.ops = {
			.early_channel_count	= f17_early_channel_count,
			.dbam_to_cs		= f17_base_addr_to_cs_size,
		}
	},
	[F17_M11H_CPUS] = {
		.ctl_name = "F17h_M11h",
		.f0_id = PCI_DEVICE_ID_AMD_17H_M11H_DF_F0,
		.f6_id = PCI_DEVICE_ID_AMD_17H_M11H_DF_F6,
		.ops = {
			.early_channel_count	= f17_early_channel_count,
			.dbam_to_cs		= f17_base_addr_to_cs_size,
		}
	},
};

Additional logic was necessary to set new fam_type->f0_id and fam_type->f6_id according to the cpu model rather than the cpu family.

	case 0x17:
		if (pvt->model == 0x11) {
			fam_type = &family_types[F17_M11H_CPUS];
			pvt->ops = &family_types[F17_M11H_CPUS].ops;
			break;
		}
		fam_type	= &family_types[F17_CPUS];
		pvt->ops	= &family_types[F17_CPUS].ops;
		break;

I compiled Linux kernel 4.17.13 with these changes and then rebooted.

After logging back in, I tried checking the kernel messages for the EDAC module.

embedded@embedded-pc:~$ dmesg | grep “EDAC”
[    0.144310] EDAC MC: Ver: 3.0.0
[    0.144310] EDAC DEBUG: edac_mc_sysfs_init: device mc created
[    9.337213] EDAC amd64: Node 0: DRAM ECC enabled.
[    9.337215] EDAC amd64: F17h_M11h detected (node 0).
[    9.337221] EDAC DEBUG: reserve_mc_sibling_devs: F0: 0000:00:18.0
[    9.337222] EDAC DEBUG: reserve_mc_sibling_devs: F3: 0000:00:18.3
[    9.337222] EDAC DEBUG: reserve_mc_sibling_devs: F6: 0000:00:18.6
[    9.337223] EDAC DEBUG: read_mc_regs:   TOP_MEM:  0x00000000e0000000
[    9.337224] EDAC DEBUG: read_mc_regs:   TOP_MEM2: 0x0000000820000000
[    9.337250] EDAC DEBUG: read_dct_base_mask:   DCSB0[0]=0x00000001 reg: 0x50000
[    9.337253] EDAC DEBUG: read_dct_base_mask:   DCSB1[0]=0x00000001 reg: 0x150000
[    9.337256] EDAC DEBUG: read_dct_base_mask:   DCSB0[1]=0x00000201 reg: 0x50004
[    9.337259] EDAC DEBUG: read_dct_base_mask:   DCSB1[1]=0x00000201 reg: 0x150004
[    9.337263] EDAC DEBUG: read_dct_base_mask:   DCSB0[2]=0x00000000 reg: 0x50008
[    9.337266] EDAC DEBUG: read_dct_base_mask:   DCSB1[2]=0x00000000 reg: 0x150008
[    9.337269] EDAC DEBUG: read_dct_base_mask:   DCSB0[3]=0x00000000 reg: 0x5000c
[    9.337272] EDAC DEBUG: read_dct_base_mask:   DCSB1[3]=0x00000000 reg: 0x15000c
[    9.337275] EDAC DEBUG: read_dct_base_mask:   DCSB0[4]=0x00000000 reg: 0x50010
[    9.337278] EDAC DEBUG: read_dct_base_mask:   DCSB1[4]=0x00000000 reg: 0x150010
[    9.337280] EDAC DEBUG: read_dct_base_mask:   DCSB0[5]=0x00000000 reg: 0x50014
[    9.337283] EDAC DEBUG: read_dct_base_mask:   DCSB1[5]=0x00000000 reg: 0x150014
[    9.337286] EDAC DEBUG: read_dct_base_mask:   DCSB0[6]=0x00000000 reg: 0x50018
[    9.337289] EDAC DEBUG: read_dct_base_mask:   DCSB1[6]=0x00000000 reg: 0x150018
[    9.337292] EDAC DEBUG: read_dct_base_mask:   DCSB0[7]=0x00000000 reg: 0x5001c
[    9.337295] EDAC DEBUG: read_dct_base_mask:   DCSB1[7]=0x00000000 reg: 0x15001c
[    9.337298] EDAC DEBUG: read_dct_base_mask:     DCSM0[0]=0x03fffdfe reg: 0x50020
[    9.337301] EDAC DEBUG: read_dct_base_mask:     DCSM1[0]=0x03fffdfe reg: 0x150020
[    9.337304] EDAC DEBUG: read_dct_base_mask:     DCSM0[1]=0x00000000 reg: 0x50024
[    9.337307] EDAC DEBUG: read_dct_base_mask:     DCSM1[1]=0x00000000 reg: 0x150024
[    9.337310] EDAC DEBUG: read_dct_base_mask:     DCSM0[2]=0x00000000 reg: 0x50028
[    9.337313] EDAC DEBUG: read_dct_base_mask:     DCSM1[2]=0x00000000 reg: 0x150028
[    9.337316] EDAC DEBUG: read_dct_base_mask:     DCSM0[3]=0x00000000 reg: 0x5002c
[    9.337319] EDAC DEBUG: read_dct_base_mask:     DCSM1[3]=0x00000000 reg: 0x15002c
[    9.337320] EDAC DEBUG: read_mc_regs:   DIMM type: Unbuffered-DDR4
[    9.337321] EDAC DEBUG: __dump_misc_regs_df: UMC0 DIMM cfg: 0x1
[    9.337322] EDAC DEBUG: __dump_misc_regs_df: UMC0 UMC cfg: 0x80001200
[    9.337322] EDAC DEBUG: __dump_misc_regs_df: UMC0 SDP ctrl: 0xb0408083
[    9.337323] EDAC DEBUG: __dump_misc_regs_df: UMC0 ECC ctrl: 0x401
[    9.337326] EDAC DEBUG: __dump_misc_regs_df: UMC0 ECC bad symbol: 0x0
[    9.337329] EDAC DEBUG: __dump_misc_regs_df: UMC0 UMC cap: 0x30
[    9.337330] EDAC DEBUG: __dump_misc_regs_df: UMC0 UMC cap high: 0x40000030
[    9.337331] EDAC DEBUG: __dump_misc_regs_df: UMC0 ECC capable: yes, ChipKill ECC capable: no
[    9.337331] EDAC DEBUG: __dump_misc_regs_df: UMC0 All DIMMs support ECC: yes
[    9.337332] EDAC DEBUG: __dump_misc_regs_df: UMC0 x4 DIMMs present: no
[    9.337333] EDAC DEBUG: __dump_misc_regs_df: UMC0 x16 DIMMs present: no
[    9.337333] EDAC MC: UMC0 chip selects:
[    9.337334] EDAC DEBUG: f17_base_addr_to_cs_size: BaseAddr: 0x1, AddrMask: 0x3fffdfe
[    9.337335] EDAC DEBUG: f17_base_addr_to_cs_size: BaseAddr: 0x201, AddrMask: 0x3fffdfe
[    9.337336] EDAC amd64: MC: 0: 16383MB 1: 16383MB
[    9.337337] EDAC amd64: MC: 2:     0MB 3:     0MB
[    9.337338] EDAC amd64: MC: 4:     0MB 5:     0MB
[    9.337338] EDAC amd64: MC: 6:     0MB 7:     0MB
[    9.337339] EDAC DEBUG: __dump_misc_regs_df: UMC1 DIMM cfg: 0x1
[    9.337340] EDAC DEBUG: __dump_misc_regs_df: UMC1 UMC cfg: 0x80001200
[    9.337341] EDAC DEBUG: __dump_misc_regs_df: UMC1 SDP ctrl: 0xb0408083
[    9.337341] EDAC DEBUG: __dump_misc_regs_df: UMC1 ECC ctrl: 0x401
[    9.337344] EDAC DEBUG: __dump_misc_regs_df: UMC1 ECC bad symbol: 0x0
[    9.337347] EDAC DEBUG: __dump_misc_regs_df: UMC1 UMC cap: 0x30
[    9.337348] EDAC DEBUG: __dump_misc_regs_df: UMC1 UMC cap high: 0x40000030
[    9.337349] EDAC DEBUG: __dump_misc_regs_df: UMC1 ECC capable: yes, ChipKill ECC capable: no
[    9.337349] EDAC DEBUG: __dump_misc_regs_df: UMC1 All DIMMs support ECC: yes
[    9.337350] EDAC DEBUG: __dump_misc_regs_df: UMC1 x4 DIMMs present: no
[    9.337351] EDAC DEBUG: __dump_misc_regs_df: UMC1 x16 DIMMs present: no
[    9.337351] EDAC MC: UMC1 chip selects:
[    9.337352] EDAC DEBUG: f17_base_addr_to_cs_size: BaseAddr: 0x1, AddrMask: 0x3fffdfe
[    9.337353] EDAC DEBUG: f17_base_addr_to_cs_size: BaseAddr: 0x201, AddrMask: 0x3fffdfe
[    9.337354] EDAC amd64: MC: 0: 16383MB 1: 16383MB
[    9.337354] EDAC amd64: MC: 2:     0MB 3:     0MB
[    9.337355] EDAC amd64: MC: 4:     0MB 5:     0MB
[    9.337356] EDAC amd64: MC: 6:     0MB 7:     0MB
[    9.337357] EDAC DEBUG: __dump_misc_regs_df: F0x104 (DRAM Hole Address): 0xe0000001, base: 0xe0000000
[    9.337357] EDAC DEBUG: dump_misc_regs:   DramHoleValid: yes
[    9.337358] EDAC amd64: using x4 syndromes.
[    9.337358] EDAC amd64: MCT channel count: 2
[    9.337360] EDAC DEBUG: edac_mc_alloc: allocating 1992 bytes for mci data (16 ranks, 16 csrows/channels)
[    9.337393] EDAC DEBUG: init_csrows: MC node: 0, csrow: 0
[    9.337394] EDAC DEBUG: f17_base_addr_to_cs_size: BaseAddr: 0x1, AddrMask: 0x3fffdfe
[    9.337395] EDAC DEBUG: get_csrow_nr_pages: csrow: 0, channel: 0, DBAM idx: 0
[    9.337396] EDAC DEBUG: get_csrow_nr_pages: nr_pages/channel: 4194048
[    9.337396] EDAC DEBUG: f17_base_addr_to_cs_size: BaseAddr: 0x1, AddrMask: 0x3fffdfe
[    9.337397] EDAC DEBUG: get_csrow_nr_pages: csrow: 0, channel: 1, DBAM idx: 0
[    9.337398] EDAC DEBUG: get_csrow_nr_pages: nr_pages/channel: 4194048
[    9.337399] EDAC DEBUG: init_csrows: Total csrow0 pages: 8388096
[    9.337399] EDAC DEBUG: init_csrows: MC node: 0, csrow: 1
[    9.337400] EDAC DEBUG: f17_base_addr_to_cs_size: BaseAddr: 0x201, AddrMask: 0x3fffdfe
[    9.337401] EDAC DEBUG: get_csrow_nr_pages: csrow: 1, channel: 0, DBAM idx: 0
[    9.337402] EDAC DEBUG: get_csrow_nr_pages: nr_pages/channel: 4194048
[    9.337402] EDAC DEBUG: f17_base_addr_to_cs_size: BaseAddr: 0x201, AddrMask: 0x3fffdfe
[    9.337403] EDAC DEBUG: get_csrow_nr_pages: csrow: 1, channel: 1, DBAM idx: 0
[    9.337404] EDAC DEBUG: get_csrow_nr_pages: nr_pages/channel: 4194048
[    9.337405] EDAC DEBUG: init_csrows: Total csrow1 pages: 8388096
[    9.337406] EDAC DEBUG: edac_mc_add_mc_with_groups: 
[    9.337407] EDAC DEBUG: edac_create_sysfs_mci_device: creating bus mc0
[    9.337418] EDAC DEBUG: edac_create_sysfs_mci_device: creating device mc0
[    9.337451] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm0, located at csrow 0 channel 0 
[    9.337473] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device rank0
[    9.337473] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm1, located at csrow 0 channel 1 
[    9.337495] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device rank1
[    9.337496] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm2, located at csrow 1 channel 0 
[    9.337516] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device rank2
[    9.337516] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm3, located at csrow 1 channel 1 
[    9.337541] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device rank3
[    9.337554] EDAC MC0: Giving out device to module amd64_edac controller F17h_M11h: DEV 0000:00:18.3 (INTERRUPT)
[    9.337555] EDAC DEBUG: edac_pci_alloc_ctl_info: 
[    9.337556] EDAC DEBUG: edac_pci_add_device: 
[    9.337557] EDAC DEBUG: add_edac_pci_to_global_list: 
[    9.337557] EDAC DEBUG: find_edac_pci_by_dev: 
[    9.337558] EDAC DEBUG: edac_pci_create_sysfs: idx=0
[    9.337559] EDAC DEBUG: edac_pci_main_kobj_setup: 
[    9.337571] EDAC DEBUG: edac_pci_main_kobj_setup: Registered '.../edac/pci' kobject
[    9.337571] EDAC DEBUG: edac_pci_create_instance_kobj: 
[    9.337575] EDAC DEBUG: edac_pci_create_instance_kobj: Register instance 'pci0' kobject
[    9.337577] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
[    9.337578] AMD64 EDAC driver v3.5.0

The device ids did the trick, now the EDAC kernel module detects the memory controller properly.

Back to checking the number of corrected/uncorrected errors detected:

embedded@embedded-pc:~$ edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
edac-util: No errors to report.

ECC error decoding is now properly working in Linux!

I’ve submitted my fix for ECC with the AMD Ryzen Embedded V1000 Processor Family to the mainline linux kernel (my patch in mailing list).

You can obtain the mainline Linux kernel 4.18.0-r8 with the patch applied from a zip file (here) or clone the repository at https://github.com/mikhail-j/linux.