Outage today

Apologies for the outage today and on Friday, we upgraded the RAM in the ftp.leg.uct.ac.za server (thank you Chem Eng).

Unfortunately, there were a few complications (to do with bad kernels) and it resulted in a lot more downtime than it should have. But everything is now working correctly again, and we aren’t likely to run out of RAM again, within the lifespan of this server.

Comments

Kernel problems?

Hi Stefano,

More details on the kernel issues?

Insert long tale of woe here

Firstly, I thought that dreamcoat contained non-ECC RAM, and so we bought non-ECC new RAM. The mismatch caused it to not boot until we removed the old, ECC RAM. That was why Friday’s outage was longer than the 5 minutes it takes to install RAM.

Then on the weekend, I noticed that it was only registering ~800MB of RAM, because the kernel it was running didn’t have HIGHMEM enabled. We run a custom kernel on dreamcoat, because when we bought it, standard debian kernels wouldn’t work properly on the motherboard. Of course we are now a debian release down the line, but we stick with what works unless there is a good reason to change.

I didn’t replace the kernel on the weekend, because I wouldn’t have physical access. Just as well, when I did replace it, it didn’t reboot properly, and I had to drive to campus to sort it out.

Then I found that the kernel config was wrong and didn’t support dreamcoat’s network controller (I had used tonic’s config rather than dreamcoat’s). So I had to compile it again.

Now, we are still getting network controller resets (as described here). Unfortunately nobody has fixed it for our model of network card yet.

Syndicate content