After a decent trial run, I've decided that the blogging feature in Drupal, at least as of the 5.x stream, is not especially conducive to the writing process. Partially for my own edification about the platform, and partially because I looks like fun, I've started a Tumblelog at dailyfix.tumblr.com. The idea is pretty simple: as often as I can, I will be posting a fix for some problem I've encountered, in the hopes that others will find them useful, and to track them for my future use since many of these problem recur and I have to dig up the previous fix from Google or from memory.

I will continue to post here occasionally, but will focus on essays more than blog posts since I find that format more natural. Enjoy!

I have been fighting with jquery for the last few hours, and almost forgot to post. I'm still on the fence about whether jquery is a good thing, or a very evil thing. For now, suffice to say that it's nearly impossible to step through with the javascript debugger, and that in and of itself is really a hassle, despite the ultraslick syntax I will post more about my experience and complaints tomorrow when I've had some time to cool down.

Debugging for my Life

The title is a bit overdramatic, but I have just moved into a new office space and have been working frantically to resolve lots of small to medium problems.  Basically, the office is totally wireless. No ethernet or phone cables are strung anywhere, and I want everything to just work and  I bought a couple of new pieces of equipment to make that possible. 

Phones were the simple part.  Between cordless phones and wireless phone jacks, it's a snap to make all the phones talk to one another. The network was not as forgiving.  My general idea is to have one primary access point where the cable service comes in, and then put WDS-enabled access points in every office so that a person's computers can talk to one another over the wired connections, and route all outbound traffic through the main link.  That provides both a nice blanket network so that wireless is available throughout the office, while providing most of the niceties of wired connections.  I picked up some new hardware to make this possible, including a WRTSL54g, which is a fancier version of the standard WRT54g in that it has a USB port and so can have better storage capacity. I also picked up a new standard WRT54gl to act as WDS point. 

I had this system set up at home, but I used WEP to send the data over WDS, and I wanted to go to WPA for better security and ease of use (WPA2 may be my next project).  I also wanted everything to run the latest OpenWRT kernel, Kamikaze.  Of course, they went and changed all the WDS stuff since the previous version (White Russian) so I had to repuzzle out a lot of it.  I don't have the energy to link all the posts that I used to finally get it all working, but suffice to say, it took a lot longer than I had hoped, but I am currently writing this post sending my data out over a WPA encrypted WDS link, so I am feeling pretty good.  I really didn't want to walk into the office tomorrow still staring this same problem in the face. 

Isolation of a problem can be a tricky thing.? If you have the luxury of the ability to actually change the system and observe the effects of your changes, an easy way to approach the problem is to start from the known bad system run, turn that into a test case that you can repeatedly run, and then start lopping away stuff that you either assume is not related to the problem, or that you have already verified is not related.? You can tell how well you understand the problem by how good your choices in regards to what you slice off.? If you repeatedly make cuts that either break the application or that make the problem unexpectedly go away, then you should probably do some more digging before you do this kind of isolation.

Cutting away can take two forms: either, you can simply skip a step that seems irrelevant, or you can replace something with a mock object (a topic that is too broad for this post) that gives you back some necessary data.? Eventually, the goal is to cut away everything that is not relevant to illustrating the problem.? This is usually some setup to get the right data structures, some actions to make them change in the correct way, and a comparison with the correct result (which will fail).

For instance, let's say that you have a PHP script that is generating bad output.? The page you get back lays out in a completely unexpected way, and eyeballing the HTML source, it's clear that you are not generating the expected HTML.? First of all, your debugging the problem is hindered by the fact that if you are omitting tags or adding strange attributes to things, the browser is covering a lot of that in its attempt to be fault-tolerant.? So you can cut away the browser and replace it with an HTML DOM parser that verifies the output is what you expect.

Next, all the pages have headers and footers and sidebars, which all layout fine, as they do on every other page.? So you cut the includes and such that create those parts.? The DOM is still not what you would expect.? Finally, you cut away some of the calls that insert database data into the layout and replace them with hardcoded text that approximates what you think is coming back from the database.? Suddenly, the DOM is fine.? You hadn't anticipated that the database data was the issue, but clearly it's relevant and you've cut too much.

So you then cut HTML layout generation from the testing completely, and just write tests to look at the values coming back from the database.? It becomes clear that HTML has crept into certain fields unexpectedly and that is throwing off the layout.? At this point, you have isolated the problem, and you have a simple, quick test that only does the thing necessary to illustrate the problem.? This will greatly aid your Repair process, since it will be easy to verify any attempted fixes, and you can easily run the whole real world test case again when you get to the Validation stage.

I noticed some people winding up here because they searched for "WordPress 2.1" and "broken" and "Tiga" (the theme that I adapted for my site) or some combination of those terms. I wanted to provide a slightly more detailed explanation of what I did to fix it. In the Tiga theme, the links list is generated by a foreach that loops over the links, printing them out in category blocks. The problematic call in Wordpress 2.1 is the one that assigns the original variable to iterate over:

$link_cats = $wpdb->get_results("SELECT cat_id, cat_name FROM $wpdb->linkcategories");

As I mentioned before, linkcategories no longer exists, so instead you need to query the categories table.

$link_cats = $wpdb->get_results("SELECT cat_id, cat_name FROM $wpdb->categories");

Except if you do that, you get all of your post categories showing up as well. So you want to filter on only the categories that contain links and that was the tricky part. By looking at the MySQL table structures that are printed in the WordPress Codex, I noticed that there is a link_count field. I changed the query to filter on anything that has a link_count > 0, like this:

$link_cats = $wpdb->get_results("SELECT cat_id, cat_name FROM $wpdb->categories WHERE link_count > 0");

That did the trick, and it doesn't suffer from the potential problems that I described in the original post where I was filtering on link_id. If this helps you, or if you have a better solution, would you mind leaving a comment? I'll eventually post this somewhere more permanent if it seems to be helping people. Thanks!

I'm going to veer off topic a bit again (sorry for those looking for debugging stuff, back tomorrow) and talk about a different kind of debugging, trying to figure out why some couples don't work out. I have a theory called "BS Filter Compatibility" that I think explains a lot. Have you ever met a couple that goes to a movie together and one of them thinks it was really deep and powerful and the other one thinks it was fluffy nonsense? To me, those couples never last, or if they do, they aren't ever that happy. The problem is their BS filters are calibrated differently. This ultimately means that one person starts to lose respect for the other, which is death for a relationship.

The origin of the BS filter theory came from stories that I've heard many long-term couples tell about their meeting and initial bonding. For example, my wife and I originally found a connection because we both felt that a student organization that we had both joined independently was totally pointless and overblown (and we quit soon after). We felt like we were the only ones who could see it, and it gave us mutual respect for each other right from the start. Oddly, the only "romantic" movie I've seen play up this angle is Wedding Crashers, specifically the initial wedding scene where the sister-of-the-bride (on the altar as a bridesmaid) can't stop laughing as the bride and groom read their ultra-cheesy wedding vows laden with horrible sailing puns. I loved the fact that Owen Wilson's character basically chooses her at that moment because he realizes that she finds it as ridiculous as he does.

The problem is, how close do people's BS filters really need to be for compatibility. Based on the incredibly unscientific survey of myself and my wife of seven years, I would say you have to see the value in about 75% of things that the other person is interested in, and you can feel smug about the other 25% without things disintegrating. For instance, if I look at our musical tastes, we certainly don't overlap 100%, and very few people do. Instead, we both find common ground in classic soul and rock, 70s disco and funk, hip-hop, and some modern pop music. I enjoy a lot of electronic and indie rock (think Stereolab and Geggy Tah), and modern folk/singersongwriter stuff that she can't stand and lots of jazz to which she is mostly indifferent. She enjoys 80s and 90s punk and other hard rock, and some contemporary rock that I can't stand. However, the 75% of music that we do like in common is enough to always find a mutually agreeable song on the radio, or to put together a mix CD that both can enjoy.

I don't pretend to be an expert, but if you are in a relationship where you just don't feel compatible, it might help to sit down and make short list of music, movies, books, etc that you think are good and meaningful, and think about the same set for your partner. If you find yourself sneering at his or her selections and at the end you only have a few things in common, it might be time to reevaluate.

So it looks like I am getting a fair number of people coming around from the NaBloPoMo Randomizer. Having viewed many of the other blogs, it is clear that I am sort of out of the mainstream here with my geeky technobabble. So I thought I might write something less boring or more useful to the blogging community (or hopefully both) in a first post, and continue my other topic for those who are interested in a second post.

Today my topic is sitemeter. Sitemeter is how I know people are using the randomizer to get to me and not that I suddenly have a readership. I recommend it. It's free and it answers useful questions like, "does anyone care what I have to say?" by showing you who visits (just by generic info, so I don't really know who you are) and how long they stay around. For example, I know that this morning, I've gotten 4 randomizer hits, but no one chose to click beyond the first page. Maybe I need more alluring titles for old posts.

Sitemeter has a nice feature where you can block your own browser or your own IP address so that you don't show up in your own statistics. I stumbled across a funny sort-of bug (you know I couldn't stay off the topic that long) in the way they block you. I have it set up to block my IP address since I work out the house and am at home 95% of the time. One morning I was actually working out of a coffee shop and I pulled up the page without thinking about the fact that I would show up. No big deal, I could ignore that 1 random hit from myself.

The funny part was after I got back home. I check in every so often during the day to see if anyone has commented (they haven't). On the third or fourth visit of the day I pulled up my stats. I was astounded! Someone was visting the blog like every hour! They must be desperately visiting the blog in a vain hope that I might have posted something that they can read. I must quickly satisfy their hunger! So I posted, and checked back an hour or so later to see if they had visited again.

Looking at the stats I see yes, they had been back! Then I noticed something odd. The mystery visitor's referral page was one of the administrative pages on my blog, something that only I have access to. Further investigation revealed that indeed, the visitor could only be me. How was this possible given that I have myself blocked? Well, it turns out that once I had visited in the morning from an unblocked location, sitemeter must have sent me back a cookie. Then, when I visited later from home, rather than re-looking me up to see where I was visiting from, it just associated that same data with my browser so I appeared to be coming from the coffee shop even though I was at home.

I was disappointed that no one was obsessively checking my blog, but at least got to investigate and solve an interesting problem. To fix it, I just went to sitemeter and had them block my browser, in addition to IP address. I probably could have also just gone in and deleted the cookie. If someone else out there has experienced the same thing and was totally baffled, I hope this helps.

I wanted to break out of the Distance Debugging discussion for a moment (to return later today) to highlight a new feature I just added up top, the Distance Debugging Help Desk. In brief, I want to help you fix your problems with computer hardware and software in order to get practice debugging. The only thing I ask in return is that I can post about it, and if you have a blog, that you link back to me if I fix your problem. Anyway, there are a few caveats that are listed in full in the page linked at the top (basically, I can't fix everything, I'm very busy, and please don't sue me), so check that out and then email me at helpdesk (at) distancedebugging.com if you are so inclined.

For one very brief example, I stumbled across Alex Hopmann's Blog a few days ago and he had noted a problem where hard drive accesses caused his computer to slow way down. It just so happened that recently I was investigating problems with DMA on Linux, and although it was a Windows computer, the symptoms were exactly the same. I emailed him and with that clue, as he explains in this post, he was able to resolve it with no trouble and got his laptop much more functional again.

Anyway, I can't guarantee a fix for you would be that simple, but it can't hurt, can it?

SATA Saga

Returning to the subject of Sunday's post, I have gotten to the bottom of all of the issues regarding SATA vs. PATA and the missing DVD drive. I am grateful to those on the web who have posted various solutions regarding these issues. I'm going to go one step further and document my thought process a bit in order to assist others who don't even know where to begin looking at these issues.
1) I started from the issue that my DVD drive was not being enumerated by the ata_piix driver. I got hooked on the notion that I needed to enabled ATAPI support in the driver. It's experimental so it's not enabled by default. Depending on whether it's compiled into the kernel or loaded as a module, you have to either pass a boot parameter (libata.atapi_enabled=1) or do the same via the modprobe.conf file.

2) I spent an inordinate amount of time working on this solution. Trying it each way, compiling new kernels, rebuilding my initrd, etc. All trying to get it to accept the parameter. One thing that really hindered me was the fact that there is no way that I know of to query the loaded modules and ask "what parameters did you receive at the time you were loaded?" to sanity check that I was doing the right thing.
3) After reading a few dozen web pages, I noticed one mentioned that you should set your BIOS to AHCI if possible to get this work. While I think this is actually in error (since if you were set to AHCI it would use a different driver), it led me to the eventual correct conclusion. My BIOS (on the Abit AW8D) supports a bunch of different modes, and I for some reason had it set to Combined Mode. I now know that Combined Mode has issues, but I didn't know that at the time I set things up. What I really wanted was enhanced mode, and to set the IDE mode to AHCI. Since I have an ICH7 chipset, this should be no big deal, and Fedora will use the ahci driver instead of ata_piix. This is exactly how it works on my laptop with the same chipset.
4) However, I couldn't get the system to boot in that configuration. I assumed that I was doing something totally wrong and that it wasn't recognizing the drives. I was totally stumped because it would boot and then print out

"GRUB GRUB GRUB" or something like that, as if Grub was trying to load and getting screwed up.

5) I tried booting in AHCI mode with the rescue disk. No problem. System comes up, drives can be mounted.

6) Now I'm starting to suspect that it's just an issue with the grub configuration. Sure enough, I run:

grub-install --recheck /dev/sda (since my MBR and boot partition are on the SATA drive)

and notice that previously, the device map looked like:

hd0 /dev/sda (the SATA drive)

hd1 /dev/hda (the PATA drive)

and now it's swapped. Apparently the switch from Combined to Enhanced (required for AHCI) caused the drives to be enumerated a different order. This wouldn't be such a big deal, except that grub has to be installed on the first drive, or you have to add a chainloader directive. But what's with the GRUB GRUB GRUB stuff? Then I remember, I used the PATA drive as the main drive on a previous installation so it has an old (and now totally corrupted) grub installation. That was the big red herring this whole time.

7) So I grub-install onto /dev/hda and then change all the hd0s to hd1 (since the boot partition is still on the second drive) in the grub.conf file. System boots, but hangs since I still have the ide0=noprobe directive. Take that out, try again, system boots to login.
8) It now comes up with the ahci driver (I guess kudzu on redhat figures this all out), and the the PATA drive and the DVD drive are handled by the ide subsystem. I look at the drives, and everything is golden. DMA is enabled on everything, and I'm seeing hdparm timing results consistent with what I was seeing with the ata_piix driver.

So that ends the saga of the slow or missing drives. What is frustrating is how much time I spent on the problem of trying to keep the ata_piix driver from fighting with the ide-generic driver, rather than noticing that my real problem was that I was just in the wrong mode. The ahci driver doesn't have this problem so it's not even an issue. I was so focused on thinking that I was clever with the initial solution of setting the combined_mode flag that I ignored all the evidence that was telling me that I was doing things wrong from the beginning. One last note, I think that everyone using Linux on SATA drives owes Jeff Garzik a note of thanks for all his hard work, and if you'd like to learn more about this stuff, his webpages have a wealth of information.

The Magic of DMA

I have posted previously about my new Linux server, that started somewhat auspiciously with a bad motherboard. I've had some other odd problems which I have mostly solved (or am in the process of solving) that I thought might be of use to others.

To start with, my server runs in so-called "Combined Mode", where the SATA and PATA channels are separated on the motherboard. This is the only mode in which I can boot my server currently. This may have something to do with the fact that the boot partition (/boot) and the MBR are on the SATA drive, and I would have to build the kernel differently since SATA support is built as a module rather than built-in to the kernel. This is just speculation though. Mostly I am taking an "if it ain't broke" position and leaving it alone.

I actually have two HDs in the machine, a 160GB SATA that holds most of my real data, the boot partition, and the swap area, and a 80GB PATA drive that holds the Fedora Core (now 6) installation. The Abit mobo only has a single IDE connector, so I have the PATA drive and my DVD writer on that cable as master-slave, and the SATA drive is separate. I noticed right away that the server seemed blazing fast for a lot of things, but it was jerky and slow for others. I immediately thought about the mishmash of drives and figured something was up. So I did a little hdparm analysis:

hdparm -tT /dev/hdc (the PATA drive)

Timing cached reads: 3496 MB in 2.00 seconds = 1748.62 MB/sec
Timing buffered disk reads: 7 MB in 3.02 seconds = 2.31 MB/sec

Ugh. There's the problem. I should be getting in the range of 40-50MB at least. So I checked the status, and of course no DMA or anything so I tried the obvious:

hdparm -d1 /dev/hdc

And I get the common "Operation not Permitted" error that usually means that the specific driver for your motherboard chipset is not available in the kernel. However, I verified that my chipset (ICH7) was there. Long story short, I found my way once again to the incredibly useful ThinkWiki, to the page about Linux and SATA:

http://www.thinkwiki.org/wiki/Problems_with_SATA_and_Linux#No_DMA_on_DVD...

along with a Redhat bugzilla report along the same lines. It turns out that the issue is that you want the libata driver to grab the PATA drives as well, but usually the regular ide driver grabs them and in combined mode that means you are unable to do anything like manipulate DMA (I am paraphrasing here, the reality is much more complex). Anyway, by adding the flag combined_mode=libata to the kernel boot parameters, I now get:

Timing cached reads: 3496 MB in 2.00 seconds = 1748.62 MB/sec
Timing buffered disk reads: 168 MB in 3.02 seconds = 55.64 MB/sec

So like 25x faster with a simple boot-parameter change. The whole system is just so much faster now, it's hard to believe. Unfortunately, this had the side-effect of making the DVD drive disappear. It looks like I somehow am not enabling ATAPI in the libata driver, so that's what I'm working on. I'll post again when I've got that solved.

Syndicate content