mod-wsgi segfault fix...

  |   Source

...or how to improve your Google-Foo

this morning I was doing some coding on a simple wsgi app with web.py and mod-wsgi under Ubuntu Saucy. So, now Saucy has Apache 2.4 on board, and somehow after starting a simple apache2.4 config with mod_wsgi configured in daemon mode, I saw a strange logmessage, which told me something like this:

[Sat Aug 10 14:06:18.945155 2013] [core:notice] [pid 19099:tid 140698944931712] AH00051: child pid 19105 exit signal Segmentation fault (11), possible coredump in /etc/apache2

First I thought, it's my simple wsgi application, but looking deeper, I was pretty much sure, it wasn't my app.

First looked on Launchpad, if there is any bug filed about this issue, but I couldn't find anything related. Second I looked on bugs.debian.org, but honestly, I didn't find anything which would match my bug.

Finally, Google:

Searching for: apache 2.4 mod_wsgi process has died

And some results were popping up.

Not only looking on the titles, but also correlating dates, I found my bug as fifth entry on the search result page: Immediate Segmentation Fault in Daemon Mode

Well, the last entry in this thread gave me a hint: mod_wsgi 3.5, but actually this is not released, so I could only guess the patch is somewhere in the tree. It looks like that the author bisected, or just guessed it right, so I checked upstream source tree on googlecode, and found the mentioned changeset.

Scoreboard handle in daemon mode should be set to NULL for Apache 2.4 to avoid crash in lingering close.

Ok, now the work starts.

  • DGet the source package from launchpad,
  • dpkg-source -x it
  • quilt push -a
  • quilt new
  • quilt add mod_wsgi.c
  • finding the correct source area
  • apply the fix
  • quilt refresh
  • debuild -S
  • build the package on my local buildserver
  • install it, restart apache and check the logfiles
  • segfault's gone

Perfect!

Now for the paperwork:

  1. file a bug in launchpad (this is the bugtracker for my distro of choice)
  2. create a debdiff between original package in the distro and my new one
  3. attach debdiff to bugreport
  4. write an email to submit@bugs.debian.org with a similar bugreport, attach debdiff to the mail, send it.

Now let's wait until this is getting fixed.

But why is this bugging me, that I have to write a post for it?

Because, apache and mod_wsgi is so commonly used, and one of the tasks for developing a distro is that we should test those critical packages much better. This segfault is very easy to find, but eventually not so easily to find with automated tests. It also needs some work like cherrypicking commits from upstream, apply it to the package, and rebuild it and finally do some real world testing.

Eventually this would be one good excercise to leverage a CI like Jenkins much more. If I have the time to play around, eventually it would make sense, to use LXC on a build slave, creating saucy containers, installing apache installing modules, and having a simple script task which eyes out for segfaults in certain combinations.

But still, we need more human triggered tests, means real world testing, but this alone doesn't give us any bugreports or fixed packages, right? So we need more people, who are working as SysAdmins, and doing real world tests, and who have a clue on how to fix software.

Every SysAdmin should know his/her OS of choice, and should be able to fix bugs for their distro of choice. People Managers should think if they put that as requisite when writing job descriptions.