My Toy COW box

(Thats Cluster of Off the Shelf Workstations.)

A while ago, we found a big pile of Sun IPC motherboards at a hamfest. With ram even. And they were for sale cheap. So we attempted to escape. Then we tried to escape with only a few... Unfortunately the entire box eventually followed us home. At least they were cheap...

The eventual census was about 15 live motherboards, with 25MHz sparc CPUs, and 8MB of 1MB 30 pin simms each. So then the challenge became to do something with them.

So, first we hooked up the first five to a PC power supply. The first problem was 10 Mbit Ethernet: all five CPUs could collide with each other firmly enough to prevent any of them from booting. The next problem was the death of the power supply. (I can't imagine what killed it... :-)

Anyway, here are some pictures of the first five CPUs running, in my old apartment at White Oak Towers. The power connection is complicated by the sheer number of conductors involved... (4 each of +5V and GND, then 2 12V, per CPU). Those screw array things are actually ground bus bars for breaker boxes.

They are currently running Net-BSD 1.4.2, disklessly off a Free-BSD box. Perhaps I'll but a disk on one of them, and let it boot the rest.

So, time passes (a great deal of time... :-P). I get a house, and 100 Mbit Ethernet. And at long last the nebula project once again rises from the dusty deck.

So, first problem: switched Ethernet. However, my main (24 port) Ethernet switch does not buffer packets when it goes from 100 to 10 Mbit, the result being that all NFS operations fail. I use my old 8 port switch, and it does buffer. So we're good. (Funny that old hardware works better... so much for progress.)

Next problem: how about only two motherboards per power supply? Its been running for a couple of days now, and doesn't seem to be having any overheating problems on the power supplies. The smaller number of conductors allows me to use wire nuts instead of grounding bus bars. I should tie all the grounds together....

Then, lets upgrade to Net-BSD 1.6.1...

And third: how about something a bit more robust than cardboard and electrical tape for the card cage? The new case is made from 1 by 3's and drywall screws. Since I'm building it, there must be some plexiglass somewhere in it: the cards are held by plexiglass "combs". And three thumbtacks (yes, they are actually important... the combs can not be long enough to get to the last board).

I suppose I should post the plans for the case. However, they were done with a pencil, as opposed to some CAD program.

As an added benefit, the new case also holds 8 boards instead of the original 5.

A few friends gave me a big pile of 10baseT transceivers... I've been hoarding old PC power supplies... The old 8 port Ethernet switch is on the spare pile... the cable pile has a serial cable for console... E-bay for some more RAM (it would be extremely nice to get them up to 48 MB each)...

Unfortunately, every single NVRAM chip is bad. Luckily, someone else has already solved this problem... the sun faq has a whole section. I've written a kermit script to automate it, which can be found here. (I'm lazy, I don't want to type that series of commands 8 times...) Replacement NVRAM chips are about $15 each, thats about $250 to replace them all, which is a bit steep.

So, at long last, the nebula lives once more. Or at least cpus 00 to 07 do...

bobdbob       up  43+00:42,     0 users,  load 0.00, 0.00, 0.00
neb00         up   1+01:50,     0 users,  load 0.10, 0.09, 0.08
neb01         up   1+02:05,     0 users,  load 0.16, 0.10, 0.08
neb02         up   1+02:01,     0 users,  load 0.12, 0.09, 0.08
neb03         up   1+02:01,     0 users,  load 0.07, 0.07, 0.08
neb04         up   1+01:58,     0 users,  load 0.11, 0.10, 0.08
neb05         up   1+01:58,     0 users,  load 0.06, 0.07, 0.07
neb06         up   1+01:58,     0 users,  load 0.11, 0.09, 0.08
neb07         up   1+01:55,     0 users,  load 0.08, 0.08, 0.08
stereo        up   5+14:41,     0 users,  load 0.00, 0.00, 0.00
teryx         up  63+10:44,     1 user,   load 0.00, 0.00, 0.00
zarquon       up  63+01:10,     0 users,  load 0.04, 0.11, 0.17

So, how about some software? I currently have pmake setup, which will happily execute make files on all the CPUs. The first trial, compiling bash, the entire herd went 3.7 times faster than a single member. Unfortunately, most make files do not actualy define all the dependancies, or call make recursively. So pmake is of only marginal usefulness...

I have an old Mandelbrot set calculating program which is implemented with MPI, I should get that setup.

I googled a bit, and have found that the going rate for 4MB 30 pin simms is $4 each. That sounds resonable, but after multiplying everything out, fully populating the nebula would cost $768. Thats a bit steep... Especially considering exactly how much CPU horsepower this machine can really deliver...

I have duplicated the first nebula shelf, and so CPUs 10 to 17 (they are numbered in octal... first digit is shelf number, second digit is board number) are now running. I had to get another ethernet switch, another power strip, and I ran out of 10baseT trancievers too (so there is one 10base2). However, it is now up to 16 CPUs.

I have been playing with a beta version of MPI 2, and its almost there. It has issues with enumerating the network interfaces. mpdringtest says it can loop the mpd ring in 0.18 seconds.
The full nebula:


Some attempts at benchmarking... I have (found on my software page) an old mandelbrot program laying around. It seems like a resonably valid benchmark...

So, the nebula, consisting of 16 CPUs, each with 12MB of ram, running NetBSD 1.6.1, compute the entire set (-2, -2, 4, 2000 iterations max, image size of 600 pixel square, also known as the defaults... :-) in 80.1028, 73.1207, 74.9027 seconds (3 replications).

There is a reported fix in NetBSD-current which enables an IPC to run with the cache enabled. Which should make a significant performance improvement.

So, lets see if netbsd-current from January 5 or so makes a difference... reboot with new and kernel... (the 1.12 can not load a -current kernel, it gets an unaligned memory access error.) The times are 82.30, 78.34 79.01. So much for that idea...


Some sanity checking... How about if we only run with 8 CPUs (instead of 16)? We should observe 1/2 the speed. So, the three replications are: 71.60, 72.89, 72.56. Seems we have a bottleneck on the leader thread...

So, new metric... the image is just solid max iterations... Make sure its CPU-bound. the time is 279.84 sec on 8 CPUs (only 1 rep, I got bored. :-). Or on 16 CPUs, 147.54, 146.43. Thats better... Seems it gets IO bound when the kids can complete rows fast enough.

I suspect this invalidates the finding above with the difference between 1.6.1 and -current...


I moved the nebula into a rack, to clear off the workbench...