segfaults and solitude

stuff about linux and coding and stuff

the joy of getting one’s laptop to not crash all the time

Okay, so a month and a half-ish ago, I happened to receive an HP dm1z netbook-ish laptop thing. The 4000 model to be precise; most things I’d found online pertaining to it refer to the 3000 model. It’s pretty nifty, gets fairly decent battery life (6.5ish hours, but I wouldn’t be surprised if you could get at least 1-2 more out of it), and was not terribly expensive. Also, quite a lot faster than my old Asus 1000HE. It’s got an AMD E-450 CPU or APU or whatever it is they’re calling it with Radeon 6320 graphics. I think it’s also from their Fusion line? I dunno. Model numbers are a lot more complicated than they used to be. Granted, they’ve always been arbitrary and meaningless, so I guess it’s not much of a loss.

It needed a name, so I dubbed him imhotep.

As I tend to do, I immediately set out to install Arch Linux on it. And as I also tend to do, I immediately ran into a problem: the wireless was extremely wonky. I eventually found a workaround. Unlike most workarounds I’ve come across in the past, this one cost 20 bucks: I bought a USB wifi dongle thing. I’m still investigating the built-in wireless, and hopefully I’ll address that in another post soon.

With that problem resolved stuffed into the closet, I was finally able to install Arch. And lo, it was good. Until I realized it freeze-crashed all the time. That is, it would totally freeze. Everything would become unresponsive. A small fraction of the time, it would respond to some SysRq magic things, but was never able to restore the virtual console, so the most I could do would be a soft reboot. Most of the time, though, nothing at all. A hard reboot would be necessary.

The problem was very hard to trigger predictably. It would usually happen under a heavy CPU load, but very inconsistently. I could run a stress test for 30 minutes with no issue. Then I’d start up Firefox (sidenote: at any given time I tend to have a ridiculous number of tabs open, so starting Firefox can be resource-intensive) and it would die after two minutes.

All my system logs were conspicuously free of any messages of the form:

Feb 12 20:22:37 localhost kernel: [1234.5678] some_component: oh god something bad happened and now i am going to die

I wasn’t entirely sure what component, be it CPU, GPU, or something else, was responsible, so I tried all manner of things. Using AMD’s proprietary Catalyst driver instead of xf86-video-radeon, disabling the higher clock speeds of the CPU, trying KDE rather than gnome-shell (I didn’t consider it likely that such a thing could be responsible, but I didn’t have much else to go on), and several other things I can’t remember. No luck.

Eventually, I came to realize I had mistakenly written off one of the kernel messages I would get upon booting up:

Jan 29 14:35:56 localhost kernel: [   18.244096] SP5100 TCO timer: mmio address 0xb80430 already in use

sp5100 is some sort of watchdog chip, though I have to say I don’t know much about it. In fact, I don’t know if imhotep’s motherboard even has that chip on it and if it was mistakenly loaded. If indeed it lacks the chip, then it’s easy to imagine that periodically writing to some register or memory location that belongs to something that is most definitely not an SP5100 chip could lead to problems. Regardless, it was a fairly easy fix when all is said and done. Just blacklist the module. For those not familiar with doing such a thing, you just need to create a file in /etc/modprobe.d/ with a name along the lines of sp5100.conf. The name isn’t important, but modprobe will complain if it doesn’t end in “.conf”. The contents of the file should just be:

blacklist sp5100_tco

A remarkably simple solution for something that took me way too long to figure out. Since I wasted three or four weeks on this, I figure I ought to leave a record of my ordeal so that any troubled souls haunted by the same infernal dæmons may also resolve it.

Follow

Get every new post delivered to your Inbox.