SPD-Reader-Writer icon indicating copy to clipboard operation
SPD-Reader-Writer copied to clipboard

After connecting, am unable to actually perform any operations...

Open JakeMartin99 opened this issue 1 year ago • 33 comments

Hi, I recently followed your schematic to build the hardware to run your SPD-Reader-Writer, uploaded your firmware to the arduino nano and downloaded the latest software release to my computer. However, from both the app and the CLI, the system can find/connect on COM3, but then cannot actually read any data. image image Also of note, while the hardware is hooked up to my computer, the RGB on my stick of ram lights up, so at the very least it is actually getting power all the way to ram.

For context, my ram is DDR5-6000 from GSKILL (F5-6000J3038F16GX2-TZ5NR), and I am attempting to access the SPD in order to fix corruption that I believe was caused by RGB software causing issues over the SMBus.

Any help or suggestions would be much appreciated!

Thanks!

JakeMartin99 avatar Dec 16 '23 05:12 JakeMartin99

Also, I just measured voltages, and for some reason my +5V rail is 9V above ground, and my +3.3V rail is at ~6V. My +9V rail is at 9V as expected.

I had measured my resistances prior to turning it on for the first time and everything seemed to check out, so this is surprising to me.

JakeMartin99 avatar Dec 16 '23 05:12 JakeMartin99

The Arduino does not detect any RAM connected to it. Make sure the RAM is getting power and SDA/SCL lines are not mixed up. Make sure the voltage levels are correct before connecting RAM to your Arduino. For DDR5, you only need a 5V supply. (9V rail is for DDR4-DDR3).

1a2m3 avatar Dec 16 '23 05:12 1a2m3

The SDA/SCL lines may have been messed up, but I've been doing tinkering without the RAM stick installed since and I have not been able to resolve the voltage discrepancies, so I haven't wanted to retry with the stick. Having removed the RAM stick, I still get the following odd voltage readings: GND to +3.3V: 6.5V GND to +5V: 9.4V GND to +9V: 9.0V +3.3V to +5V: 2.0V +3.3V to +9V: 1.7V +5V to +9V: 0V I've measured the resistances of all of the paths I can think of, and everything seems to match the schematic properly. Although one measurement that seems a bit odd to me is that the resistance between the +5V and +9V rails is just under 200ohms.

I included the 9V rail to be thorough and fully follow the schematic, but is there any reason that that would be causing the difference? I don't think I have any wires soldered in the wrong places nor any unintentional connections between places that shouldn't be, so I'm somewhat lost as to what to even check.

JakeMartin99 avatar Dec 18 '23 02:12 JakeMartin99

What Arduino model are you using and how is powered? Via USB or with an external source applied to VIN input? What 9V supply are you using, and have you tested it separately?

Eliminate all non-DDR5 required modules from your setup (HV_CTL, HV_SRC and SA1_CTL), and try with just Arduino and DDR5 circuitry. A picture of your setup would be handy.

1a2m3 avatar Dec 18 '23 03:12 1a2m3

image

Some of the connecting wires are soldered to the underside, such as those connecting the 9V booster (Vo to 9V rail, GND to GND rail, and Vi to 5v rail), those connecting Arduino GND, 3.3V, and 5V pins to their respective rails, the one connecting A1 to the schottkey, and those connecting A4 and A5 to their respective resistors and RAM wires. And some of the wires that go over to the RAM adapter itself.

To answer your questions, it is powered just from the USB connection, and I haven't really tested the boost component (aside from validating that it does actually produce 9V on the 9V rail).

JakeMartin99 avatar Dec 18 '23 05:12 JakeMartin99

I'll try your suggestion of disconnecting those sections you mentioned and see what happens

JakeMartin99 avatar Dec 18 '23 05:12 JakeMartin99

Kinda hard to understand what's connected to what, but to me it seems the 3.3V rail is connected to ground, is that so?

image

1a2m3 avatar Dec 18 '23 05:12 1a2m3

Sorry, kinda hard to get a good picture that represents everything that's happening. But no, GND and +3.3V have appreciable resistance between them, and as far as I can tell the only direct connection between them is "through" the capacitor. The wire you highlighted is hooked up to the 2k resistor, not the GND pin

JakeMartin99 avatar Dec 20 '23 04:12 JakeMartin99

Please draw a diagram of your setup in fritzing, and post it here. That way it will be easier for me to see what's wrong and what needs to be fixed.

1a2m3 avatar Dec 20 '23 12:12 1a2m3

SPD_bb_nowires SPD_bb One with wires and one without, since the wires kinda start obstructing a lot of stuff. I couldn't find an exact part match for my voltage booster (https://www.amazon.com/dp/B084YS7FZ8?psc=1&ref=ppx_yo2ov_dt_b_product_details), so I just used something that was vaguely related and had the right number of pins.

JakeMartin99 avatar Dec 21 '23 05:12 JakeMartin99

Everything appears to be correct component- and connection-wise.

I would start checking each component individually, at least the ones that are needed for DDR5 operation, ensure the resistor values match their labeled values.

I would eliminate parts which are not needed for DDR5, if you aren't planning to use any other type of memory.

jake-pcb

Make sure there are no solder bridges or poor soldering joints. Check the resistance between SDA/SCL pins on the DDR5 and corresponding I2C pins on the arduino.

If nothing helps, then I can offer to have a look at your board and adapter in person for a donation, if you are willing to send me your board and cover shipping both ways (I'm in Canada). If you want to go that route, leave your email, and I'll contact you to discuss details in private.

1a2m3 avatar Dec 21 '23 07:12 1a2m3

Hey, I know it's been a while, but I've only recently had the time to get back working on this. I was able to remove the non-DDR5 components you X-ed out, and fix up some of my soldering for the remaining pieces, and it is working now to be able to read data from my disconnected RAM sticks! So thank you!

However, since you seem to be a bit of a RAM expert, I was wondering if you could help / point me in the direction of documentation to help me: The reason I built this, is that ram stopped working, and I could only get my computer to boot up again by removing 2 of my ram sticks, and I believe it was because openRGB corrupted the SPD on at least one of the sticks that I took out. So, I want to use your invention to rewrite the SPD on whichever sticks are corrupt, and see if that fixes it when I put them back in. However, while I can use your program to get the dump from one of my sticks, and I can use thaiphoon burner to get the dump from the still-installed sticks, and I can manually compare the differences, I really have no idea what I'm looking for, in terms of what bytes are different for legitimate reasons, vs what are encoding something bad (such as an overclocking setting well beyond what is physically possible). I'm additionally confused by the fact that there seem to be differences in some bytes between my two installed, functional sticks, despite them both being set to use the exact same overclocking settings....

For reference, my ram sticks are DDR5 AMD EXPO enabled G.Skill Trident Neo Z5 RGB F5-6000J3038F16G.

Any insights would be most appreciated! Also, I would be more than happy to send a donation your way to support your work.

JakeMartin99 avatar Jun 01 '24 06:06 JakeMartin99

Hi, Post your SPD dumps (attach them as files), I'll see what the differences are.

The differences could be in serial numbers, but as far as I know G.Skill leaves serial number fields blank on their DIMMs.

To make donation use paypal link: https://paypal.me/mik4rt3m, or via bitcoin (get the wallet address in the program's about window).

1a2m3 avatar Jun 01 '24 06:06 1a2m3

So these are the dumps from Thaiphoon burner for my installed RAM: INSTALLED_SMBus-0-EEPROM-51h.txt INSTALLED_SMBus-0-EEPROM-53h.txt

They seem almost identical, except for the 0x200 row, where the former vs latter comparison is: 04 CD 00 23 33 7F 52 03 F0 46 35 2D 36 30 30 30 -> in ...51h 04 CD 00 23 33 CC 54 A0 F5 46 35 2D 36 30 30 30 -> in ...53h

Then, here's the first removed stick, read via SPD-reader-writer (I made mild formatting changes to match the other files to make comparison easier): REMOVED_SPD-RW-stick1.txt and seems identical to the installed except for the 0x200 row, where it has 04 CD 00 23 33 24 5B DE F2 46 35 2D 36 30 30 30

And the second removed stick, read and reformatted similarly: REMOVED_SPD-RW-stick2.txt and seems also identical except for the 0x200 row, where it has 04 CD 00 23 33 0B 5A F0 F3 46 35 2D 36 30 30 30

So, in general the 0x200 row seems to be the only places with any difference across the 4, with it being localized to the 4 bytes 0x205 through 0x208, with the following:

stick 0x205 0x206 0x207 0x208
Installed: 51h 7F 52 03 F0
Installed: 53h CC 54 A0 F5
Removed: 1 24 5B DE F2
Removed: 2 0B 5A F0 F3

Unless I missed something, this maybe says that the problem isn't actually with the SPD like I had thought? Since there doesn't seem to be any differences present in 1/both removed sticks, but not in the installed ones? Do you have any other thoughts?

JakeMartin99 avatar Jun 02 '24 06:06 JakeMartin99

All of your SPD dumps are fine, the differences are in serial numbers only.

1a2m3 avatar Jun 02 '24 07:06 1a2m3

Hmmm, well at least that's eliminated as the source of the problems... would you happen to have any thoughts of anything else that could cause new (purchased new and installed / working for a couple months) RAM sticks to stop working, in a manner that was unaffected by both resetting bios and fully powering down the computer (multiple times each)? As far as I'm aware, the SPD is the only thing on the stick that has persistent data which could be corrupted, but maybe that's wrong? Also I recognize you might not know, but figured I'd see if anything obvious stuck out to you to check as well.

JakeMartin99 avatar Jun 09 '24 01:06 JakeMartin99

Memory can be undetectable for many reasons, from CPU's MC not being able to handle more than 2 sticks at higher frequencies. Chips can go bad, or loose BGA solder contacts under the chips, dirty contacts on the DIMM itself or motherboard slot can get dirty or bent. Also, bent or dirty CPU socket pins can cause RAM to be not detected.

I would start by testing each stick individually on a knowingly working board one by one to eliminate CPU-motherboard-RAM compatibility issues and to isolate working sticks from non-working ones.

The part number you provided is for a dual channel kit. DDR5 is known to be unable to work at high frequencies, when multiple DIIMs per channel are used. If you need 64GB, instead of running 4x16GB its better to get 2x32GB kit, that's the configuration I'm currently running @ 6800MT/s CL32.

Either way, G.Skill RAM comes with lifetime warranty, if RAM is dead completely and it wasn't your fault, they'll replace it.

1a2m3 avatar Jun 09 '24 02:06 1a2m3

Ya, I may have to try a few more things then, and just RMA if it still seems unresolvable. It's the strangest thing though, because it had worked fine for weeks with all 4 sticks (brand new build, so all brand new components), but then one time I turned on my RGB control software (openRGB) and my computer hung, crashed, and then could not be booted back up (even to BIOS, and even after power cycling and resetting BIOS) until I took out the 2 sticks, so it seems unlikely that it would be soldering contacts or dirt or anything like that (which is why I suspected the SPD initially). But, to your point maybe something on the CPU memory controller or one of the sticks just totally and completely died for some reason at that time?

JakeMartin99 avatar Jun 09 '24 04:06 JakeMartin99

CPU or its MC are unlikely to die from using OpenRGB.

Try reading PMIC0 registers from working stick and dead stick and compare them or post them here.

To read PMIC0 registers, use command line version with the following command:

spdrwcli.exe /read COM5 72

Replace COM5 with your Arduino's port. 72 is PMIC0 address for DIMM with EEPROM at address 80. If your DIMM is at different I2C address, subtract 8 from it to get its PMIC0 address.

Edit: PMIC0, not SPD5 hub

1a2m3 avatar Jun 09 '24 07:06 1a2m3

Here's files for the two currently removed sticks: REMOVED_SPD-RW-stick1-PMIC0.txt REMOVED_SPD-RW-stick2-PMIC0.txt

Did both the queries against address 72 and 80 for each.

For 72 there appears to be no difference between them...

Is there any way to get the results for 72 out of my installed ram without having to take them out, like I did for 80 using Thaiphoon Burner? With the way my rig is built, physically getting RAM in and out is a bit of a chore, so I'd like to minimize how much I have to do it if at all possible.

JakeMartin99 avatar Jun 16 '24 22:06 JakeMartin99

SPD-RW also supports SMBus, just like Thaiphoon Burner. What motherboard and chipset is your system based on?

Mainstream Intel platforms are fully supported, but AMD needs some tuning and testing.

If you have an AMD based system, I'll give you a beta version to run some tests for me, before Smbus on AMD is fully supported.

1a2m3 avatar Jun 16 '24 22:06 1a2m3

Ah, ya I am on AMD, but happy to try your beta version. I'm on MSI MEG X670E ACE w/ AMD Ryzen™ 9 7950X3D, so AM5 / Zen4 architecture X670E chipset

JakeMartin99 avatar Jun 16 '24 22:06 JakeMartin99

Excellent, here you go: ~20240616-1.zip~

Extract files to a directory, then open an elevated command promt line (cmd.exe), navigate to folder where you extracted files using cd command, and run the following commands, one after another:

spdrwcli.exe /find smbus > find.txt spdrwcli.exe /scan 0 > scan0.txt spdrwcli.exe /scan 1 > scan1.txt

Then post your *.txt files, but compress them into a zip archive first, as they will be huge.

The first command will scan for available smbuses, second command will scan for eeproms on bus 0 (default), and third command will scan for eeproms on bus 1. Bus 1 is not typically used for eeproms, so it will fail, but I still need you to run it to make sure it fails properly.

1a2m3 avatar Jun 16 '24 23:06 1a2m3

Well, it did return a lot of stuff...

find-scan0-scan1.zip

JakeMartin99 avatar Jun 17 '24 05:06 JakeMartin99

Thanks, smbus discovery works fine, but scanning buses finds false positives.

Stay tuned while I prepare new test build.

1a2m3 avatar Jun 17 '24 05:06 1a2m3

Here you go:

20240616-2.zip

Run the same commands as above and post the results. The log files will be smaller, but still compress them before attaching.

The false positive was caused because the SMBusInterrupt flag (1) of status register (0x00) was checked first after checking for HostBusy flag (0), but in your case the SMBusInterrupt flag is set when smbus transaction is complete, even if the transaction ends with an error. (On Intel systems the equivalent Interrupt flag is set after a successful transaction only).

I rearranged the order of status register flags checking to check for errors first after checking for busy flag, and before checking for interrupt flag. This should resolve false positives.

I also replaced the CPUIDAPI.dll with non-debug version, this will reduce the amount of debug output. It works fine, so for now only debug output from spdrwcore.dll shall be enough.

1a2m3 avatar Jun 17 '24 07:06 1a2m3

Here you go: find-scan0-scan1-v2.zip

JakeMartin99 avatar Jun 23 '24 04:06 JakeMartin99

Thanks! Everything works properly and fails properly now. 👍

To read PMIC data off your RAM via SMBus you can use the same beta version and save data directly to binary file using these commands:

For first DIMM at address 81 on bus 0: spdrwcli.exe /read 0 73 PMIC1.bin

And the second one at address 83: spdrwcli.exe /read 0 75 PMIC2.bin

The program will still output debug data while running, but the binary files will be clean.

1a2m3 avatar Jun 23 '24 05:06 1a2m3

First one:

      00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0010: 00 00 00 00 00 2C 20 00 00 04 00 05 60 00 60 60
0020: CF DC 63 00 00 DC 63 B4 63 80 88 42 20 22 B4 5E
0030: 00 00 80 00 0E 00 00 00 00 00 00 12 8A 8C 00 00
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Second one:

      00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0010: 00 00 00 00 00 2C 20 00 00 04 00 05 60 00 60 60
0020: CF DC 63 00 00 DC 63 B4 63 80 88 42 20 22 B4 5E
0030: 00 00 80 00 0E 00 00 00 00 00 00 12 8A 8C 00 00
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Files: PMIC-installed.zip

JakeMartin99 avatar Jun 23 '24 05:06 JakeMartin99

Seems like the difference between removed vs installed is all line 0020

  • 0020: CF 78 63 00 00 78 63 78 63 80 88 42 20 22 B4 06
  • 0020: CF DC 63 00 00 DC 63 B4 63 80 88 42 20 22 B4 5E

0x78 vs 0xDC vs 0xB4 looks to be 120 vs 220 vs 180 0x06 vs 0x5E looks to be 6 vs 94

Unclear to me what these differences represent for the RAM though, or if they would plausibly be breaking something.

JakeMartin99 avatar Jun 23 '24 05:06 JakeMartin99