Working with Legacy Code: A Masterclass in Cleaning up Other People’s Shit
You like working in a legacy environment?
No you don't.
No one likes working in a legacy environment. However, whether you're a freelance programmer, part of an in-house development team, working for a software house or product agency, at some point you're going to have to deal with the stuff of nightmares - “legacy code".
Messy code. Thousands of lines of the stuff. Barely legible “write-only" code. Old programming patterns and practices, ancient libraries and, on top of it all, a bunch of coding techniques and approaches left by countless (hopefully not) anonymous developers with wildly varying skill-sets.
But look, we deal in solutions over here. So focusing on website and web app development, I'll try to describe the things I find important when dealing with this specific circle of hell from the perspective of our backend developers.
- The scope of work and quality of the project determines the amount of pain and confusion you will have to deal with.
- Make sure the person that owns the project knows about all your concerns and has a proper, reliable estimation of the time that will need to be spent on the project.
- If the project has to be fully setup on your local machine, consider improving the versioning system and creating an isolated development environment, if one isn't set up yet.
- Stick to one workflow and be consistent.
- Try to follow best coding practices, independently of the quality of the existing codebase.
- Try to leave the project in better condition than it was before - increase the code quality wherever you can afford to, considering the overall profit.
- Do not over-refactor the code. Don't rewrite its integral parts and don't waste time on upgrading libraries if it's not 100% necessary.
- Write the code bearing in mind that, as a person who already knows the codebase, you will probably be working on the project in the future.
Does it Have to be Painful?
The answer to that question depends on several things. First of all, what is the scope of the work you have to do on that legacy codebase? It could be a full feature or just a bug. This affects how big the potential change set will be.
The second issue is the condition of the code – how old is it, has it been maintained by someone, what is the known quality of the code written, has it been well documented, are any kind of tests/continuous integration tools present in the environment, has it been done using known patterns and best practices, according to an existing plan or just ad hoc?
The third factor is time pressure. Is the change you have to make critical from a business perspective and will you be able to comfortably spend some time on getting a broad understanding of the logic that is already implemented?
Of course the more time you have (which will allow you to understand what the existing codebase actually does), and the better maintained the code is, the more freedom and comfort you will have working on new features. If so, remember not to abuse the freedom you've been given. It's quite easy to screw it up and go from lucid dream back to a nightmare again.
Let's consider the worst scenarios:
- You don't know the code base
- You lack time or have just a little time to implement the bug fix/feature
- The estimation was done without a proper level of understanding of the problem, quality, age and complexity of the existing code
- The code is of soul-destroyingly poor quality
In these cases the most important thing is to make sure that the people above you (either your Project Manager or the Product Owner, or even the Partner/Client themselves if you don't have any project methodology implemented in your company) can understand the situation and how it looks from the programmer's point of view – your point of view. If you can achieve that, you can at least keep stress levels to a minimum
Try to have proper meeting or chat with the person responsible for the project. Re-estimate the feature and give her/him a clear understanding of what actually needs to be done and why it will take more time than previously estimated. Transparency is the key!
Just DO IT, Damn it!
After jumping on the project you wouldn't want to jump on, try to focus only on what you need to do, without overcomplicating things. You have to remember that some or even most parts of the code in a legacy environment simply cannot be improved, or it's improvement is simply not profitable. When working on a legacy project I usually think about the following issues:
Depending on the scope of changes, you will have to decide if you need to fully set up the project on your local machine. If you predict more work on the project in the future, it would be better to have a fully functional setup on your local machine. Then it's worth spending some time to make it run with a single click whenever you need to work on a local version of the working product or, for example, to run some tests (if the project has them).
On the other hand, you might just be making some cosmetic changes that will involve minimal input (changing some static text, adding some third party code snippets). These do not require test coverage and are not time consuming, so from the business perspective a full local setup may be simply a waste of time and resources. If there is a development environment running somewhere in the ecosystem, you can just use that to test your changes. This will work in cases where the project doesn't have fully functioning continuous integration implemented. It is really good practice to always have a copy of the files to modify on your local disk. The thing that you have to avoid is making your changes directly in the remote repository. It requires more steps in case of some urgent changes, but will make you feel more comfortable.
Nowadays there aren't many companies that don't use a version control system (VCS), although sometimes some very old code may be kept in a production environment without any copy or with just a simple file backup. In such cases consider using one.
I'm going to make a guess that you're using a non-centralised VCS in your company (like GIT). But what if the particular project you've been assigned to uses some old-school, centralised control version like SVN or even CSV? Consider migrating to GIT or any modern VCS common in your company if it is possible. Seriously, it doesn't take much time and it will help you a bunch in the future.
If the project is very old and no one has been developing anything new on it for ages there is a strong possibility that it doesn't contain any virtualisation tools for creating an isolated environment. Again, the decision you make has to based on whether or not you really need it. If you have all the services the project relies on in proper versions running on your local machine you can omit this step and run the project directly on your local machine. Although, in my experience, this is rarely the case.
If not, it is worth considering using a tool like Vagrant or Docker (seriously, who comes up with this names?). Even if setting up the configuration for the project takes some extra time, the benefits will be noticeable later. An isolated environment will solve the problem with conflicting software versions you would need to install on your local machine. Besides, the amount of existing virtual images, config files and tutorials available across the Internet makes the task much more trivial than it appears.
Choosing the Workflow
The next thing worth considering is the methodology you will use when developing the code in the legacy project. Choose something that fits your needs and try to stick to it. If you choose proper Git Flow, use it and be consistent. It is sometimes tempting to make your changes directly in the master branch, but that will just cause problems in the long term.
Sticking to development patterns requires patience and a few more steps when publishing the code, but will help keep the process clear. You will find out that in case of any potential failures it will be much easier to solve the problem and recover.
Remember that even working with messy codebases doesn't excuse messy code on your part. Always try to keep to best practice when adding your lines, and document your code, even if you are not able to fully refactor the current part of the code as a whole.
Ahhh, refactoring! The biggest temptation when working with legacy code. Refactoring is a necessary and unavoidable part of coding. One of the basic rules of writing clean code, the Boy Scout Rule is:
“Always leave the campground cleaner than you found it".
This is obviously a key requirement when working with legacy code. On the other hand though, it cannot become the sole aim. A common fault among programmers is that they tend to over-refactor legacy code. There is yet another rule in the programming world, which is as important as the last:
“A working production code is much more valuable than any code, even a perfect one, that doesn't exist yet".
From the business perspective, even if the code is of poor quality, it can still do the job. It has been tested in the live environment and probably fixed for occurrences of most, if not all, issues and edge cases possible.
There are situations though, where refactoring some serious part of the code is simply unavoidable, for example when an ancient version of some third party payment gateway API library is used and contains deprecated endpoint calls or vulnerabilities that have been fixed in more up-to-date versions. Upgrading the library to newer version may require changing some parts of the code that uses the old programmatic API.
Your main aim should be to keep the right balance between refactoring and feature development and to calculate the benefits gained from the code rework properly.
Make sure that your work increases the overall quality of the code and think about the people that will have to deal with it in the future. Finally, remember that, like the one kid in class who can use Excel, if you've been thrown a legacy project once, you will most probably be chosen to work on future changes.