Not long ago, I started work on a project that we decided would build on top of an existing application instead of reinventing the wheel ourselves. Its something I’ve done countless times before, and was confident in the decision and the reasoning behind it: re-use existing components, and leverage the power of open source software.
In this particular case, there was really only one existing open source solution for what we wanted to do, so our re-use strategy was locked into it. This is a fairly unusual situation, and because of the nature of this post, I don’t want to mention the particular package or the type of site we were building (which would also identify the package we used).
Suffice it to say, we were limited in our options when it came to evaluating the software, and in a future post (possibly the next post) I want to explore the aspect of choice in software further; we in the open source community sometimes seem embarrassed when we have multiple packages that seem to solve the same problem, and we get into religious battles over which is the best content management system, the best integrated development environment–even the best language. I believe this thinking is completely backward.
Now that we have launched the site, I still think that overall our reasoning was sound and the decision was correct. However, unlike in the past, I had significant unforeseen difficulties implementing the particular piece of software we decided to build upon. Basically, its lead developer was the Anti Joel On Software, doing such classically unprofessional things as:
- not using source control,
- not having planned releases,
- adding new features before fixing existing bugs, and
- not using automated tests,
… among others. My first inkling that I was going to have problems came when the lead developer announced a new release stating that it was “fully tested and completely free of bugs.” Now, this was a large piece of software, so I knew it wouldn’t be “completely” tested in the absence of disciplined, automated tests. Also, no software is ever “completely bug free” … period. This statement hinted at a fundamentally different philosophy of software development, or more likely, a lack of such a philosophy. He makes similar statements with every new release, and invariably the user forums are instantly flooded with cries of people not being able to install the new release, or people who have tried to upgrade and whose sites are broken.
We tried to be good members of the open source community around this software, and in retrospect, I am grateful that it at least had a fairly active community, something that even some more professional projects don’t fully facilitate. An active community of users is extremely important when evaluating software and can make the difference between effectively implementing a package or hitting the wall with it. We began by submitting detailed bug reports as we found them, not only with detailed recipes for reproducing the bugs, but often with detailed recipes for patching it. We offered support on the forums as we could, we wrote a small plugin for the software and distributed it under the GPL, and eventually became full-fledged contributors to the software.
Unfortunately, even as contributors, our ability to positively influence development practices was limited. The lead developer was ultimately the benign dictator who ran the project as he saw fit, and he didn’t see much value in either unit tests or other testing frameworks like FIT or Selenium, for example. To be fair, he was a consultant like us with his own pressures; he was using the software with his customers who were demanding new features, and because he was the lead developer, he could smooth over any rough spots in the software for his customers in a way that we couldn’t for ours. He was probably doing the best he could to support the community, and I am very grateful for his generosity in releasing the software under the GPL and for the support he offered on the forums.
As someone who was committed to using the software, however, upgrades became so onerous, that we eventually had to stop upgrading and effectively run a private fork of the software customized to emphasize fewer bugs. Because the lead developer added feature enhancements before bug fixes, our own bug fixes that we had made in our installation and contributed to the project would get wiped out in an upgrade and not appear in the software until several upgraded versions later.
To make matters worse, the lead developer made no upgrade scripts, or even an attempt at upgrade instructions beyond saying, for example, that some changes had been made to the database schema and that it might be a good idea to run diff on the files. I pointed out that when deciding to upgrade, it helped to know what bugs had been fixed and what features had been added, and that at a bare minimum, a changelog should be kept to record this information. Looking at diff output just doesn’t give you this information, which is an abstraction above changes in code; many small changes over a number of files can facilitate a new feature or fix a bug, and you can’t easily make those connections with diff output. Diff output is intenionally dumb, exactly literate, and intended to be understood by software, not people. In our case, running diff and patch would have been counterproductive anyway, because it would still wipe out the bug fixes we made.
I suspect that this situation is not entirely unheard of in the open source community at large, as not every piece of software is going to have the community and momentum of, say, a WordPress, and offer the same level of support. And as my company has become a fixture in this particular community, and people still seek help from us, we did try to come up with a DIY approach to upgrading this software. As the database is the foundation of this and most other Web based applications, changes in which ripple throughout the application, we began by finding a thorough and reliable way to upgrad an evolving database schema. The recipe we used follows.
In a database sandbox, probably on your development machine, install the source database (the current database that will be upgraded), and the target database (the latest version of the database). For each, only the table definitions matter, not the data.
To get the source database, run this command on the current installation to be upgraded:
mysqldump -h localhost -uusername -ppassword database_name > backup_file_name
For most applications, you can find the target database creation script easily by browsing the installation files or looking at a README or INSTALL file.
Next, get this software:
http://www.mysqldiff.org/
Which is a freeware PHP application. Unpack and install it into a Web server, and go to its URL, following the instructions you find there.
When running the final step, I received small errors that needed to be corrected in the MySQLDiff code. In the version I ran (1.5.0), I added this entry:
$row[“Type”] == “InnoDB”
to
library/database.lib.php
furthermore, by adding such constructs as:
( isset($row[“Type”]) && $row[“Type”] == “InnoDB” )
and commenting out code that is irrelevant for my MySQL version, I got the script to run without errors. So, basically, I ran the software several times, fixing the errors indicated in the PHP error messages output.
Once it could be run without errors, the output needed to be massaged manually. For example, the script doesn’t recognize a table modification if the table name AND fields within the table are changed in some way. It will drop the existing table and add a new definition, resulting in data loss if you use it. So, investigate any DROP TABLE commands in the output closely for this.
Also, it erroneously put “DEFAULT 0″ in auto_increment fields and added strange character set specifications after both field and table definitions, both of which I deleted wherever they occurred.
Finally, I needed to test the resulting diff, not only for MySQL errors generated when I try to run the file on the old database definition, but I also ran the MySQLDiff program AGAIN to compare the upgraded source against the target to make sure that no new diff output was generated, ensuring that the massaged output from the first run was correct. This was a somewhat involved process, but I believe it was quicker and less error-prone than a completely manual inspection of the database.
As a good community member, I post the final output of this process to the forums with each upgrade, because upgrading an evolving database schema is especially ill suited to vanilla diff. But of course, this is just the first step in what is still going to be a lengthy, error-prone and largely manual process of upgrading.
Certain aspects of software allow it to be more easily upgraded. Of course, there is no substitute for good design, and even if you don’t know any other software design pattern, you should know Model-View-Controller (MVC), a pattern that anyone can understand, and the value of which should be immediately apparent to anyone. In this context, it means that application changes are localized so that we can change code more easily and reliably. Going one step further, and using an object relational mapping (ORM) strategy further radically simplifies upgrading an application with an evolving schema. Every platform has at least one ORM option that usually boils down to describing the database schema to your code in a declarative fashion, minimizing the introduction of new bugs as you upgrade.
I am occassionally still amazed when I read about those who to this day claim not to see the benefit of MVC and aren’t embarrassed to say so publicly. I hate to say this, because I don’t think it reflects on the larger community that has overwhelmingly embraced MVC in its frameworks and best practices, but it does seem like these anti-MVC statements tend to come from PHP folks. I usually read with interest, thinking that maybe they will tell us about a whole new way to think about developing applications that will turn conventional wisdom on its head, but I am always disappointed. They seem to invariably be people who have never done more than sprinkle a little PHP on top of their static sites, and don’t have the slightest understanding of how difficult it is to develop real applications, in PHP or anything else, and don’t represent the talent and professionalism abundant in the PHP community.
I would argue that the final piece of the puzzle that can make upgrading easier and more reliable is a complete testing strategy, using unit testing at the very least, but in the case of Web based applications also utilizing functional and/or acceptance testing that tests closer to the interface or through the interface. My personal feeling is that using a framework like Selenium in addition to the use of an xUnit framework is a good trade-off between completely thorough testing and not testing at all. Running these automated tests will pin-point areas that need changing and allow you to upgrade with confidence.