Saturday, September 22, 2012

mdadm external bitmap boot hack

Update 3/1/2014: I have updated my procedure for Debian 7.4.

Linux's mdadm gives you the ability to use a write intent bitmap, which helps speed resync if a drive fails and is then re-added. There are two bitmap storage options, internal and external. The internal storage stores the bitmap on the raid array disks and can slow writing significantly. External storage uses a file on an ext2 or ext3 file system. It can be quicker than internal storage during writing but causes big problems during boot.

In order for the bitmap to function, it must be writable at the time that mdadm assembles the array. If it is not writable, mdadm will fail to start the array. At the time that mdadm assembles the array, typically no partitions are writable.  My solution to this was to shift the array assembly to some time period after mountall (mounts the file systems found in fstab) had executed.

First I prevent any raid partitions from mounting during boot, noauto in fstab.

I copied /etc/mdadm/mdadm.conf to /etc/mdadm/mdadm.manual.conf.  Then I edited /etc/mdadm/mdadm.conf changing DEVICE partitions to DEVICE /dev/null. This will make mdadm scan for raid partitions within the null device, in which it will find none. Raid devices will no longer assemble during the boot process.

Then I created a new script in /etc/rcS.d/ named S02mountRaid (don't forget execute permissions). The script contains the following lines.
mdadm --assemble --scan --config=/etc/mdadm/mdadm.manual.conf mount /mnt/raid5
This will cause mdadm to scan the copy of our mdadm.conf which we did not modify. Mdadm will correctly assemble the raid devices found in that file along with assigning the external bitmap to the proper array. The raid device is then mounted.

This script runs at every run level, and runs before any other init.d script. Future work, I need to find a way to make init.d wait for script completion.

Friday, May 4, 2012

DRAM Testing

For the past few months I have been working on restoring an old Commodore 64 that someone gave to me. It was missing a few obvious pieces and after making a new AV cable and obtaining a new power supply I found that it wouldn't boot properly. So far I have replace the PLA, VIC, capacitors, and a few other components. But this post is not about those things. This is about testing the Commodore 64's DRAM chips.

My particular C64 uses 8 individual RAM chips most are D4164C-15 and a couple are D4164C-2. I replaced all the RAM but I wanted to know if both the original and replacement chips were functioning properly. I decided to test each chip, I needed some hardware to test with.

I decided to use a MEGA8U2 AVR microprocessor, mainly because I have one that can plug into a breadboard and it has enough pins to drive the D4164C chips.

I wired the test setup to try to reduce the instruction count where I could so it is a little messy. Wiring and instruction count could be greatly improved if I didn't need the programming header, but it gets the job done.



At this point I realized I had no idea how to control DRAM. My friend CNLohr gave me a quick explanation of how DRAM works (basically you have to refresh it within time period and this ram appears to be 256 rows by 256 columns). Ok...

The first thing to do was to try to write and read just one bit.  The datasheet for the DRAM provided timing windows charts for each step required when perform all of the operations the memory is able to do. After a few hours of stepping through the charts, coding, re-coding, reviewing the charts, and sometimes just trial and error, I finally was able to write and read 1 bit from memory.  After a couple more days of work I was reading and writing to the entire memory module. (I forget exactly what made this take so long to accomplish. Some kind of bug in my program.)

I constructed a couple of routines to test both the wiring and the memory. The tests are largely based off of information found at http://www.ganssle.com/testingram.htm. There's a lot of good information there for developing ram tests.

To test the wires I wrote 0 to the first bit of the memory followed by a 1 to a power of 2 memory location (high on just 1 wire). I then read memory location zero and if the value is no longer 0, it indicates a failure on a specific address wire.

I used a walking one algorithm with a bit inversion to test all the memory cells. The goal is to toggle as many bits as possible.

In either case if there's an error, the red LED would turn off forever. While the test is running, the LED will blink at the end of each complete cycle.

I was able to test all the memory modules I had replaced. They were all functioning properly.

Update 5/5/2012:
The source code can be found at https://github.com/axlecrusher/AvrProjects/tree/master/avr_dramTest