The Perils of Duplication

by

A few weeks ago, the stream that I was working with was tasked with the job of creating two new C# solutions for upcoming work. The task involved creating the two solutions and then deploying them through demo and live ready for the projects to be worked on. The first solution was created by myself, while my colleague worked on speeding up our internal builds. As I had never written a deploy script before I knew that this task would be a great learning curve for me. There was no documentation outlining how to create a new solution within Codeweavers so I decided that this was something that I would invest time into, whilst creating the solution. The first solution required research as I had no clue how much work was involved to get the solution up and running. I decided to create a single solution to start with along with the documentation. This documentation would then be followed in the production of the second solution. This would allow me to fine tune anything that was unclear, and add any little steps that I had missed.

Whilst creating the first solution, I found that in quite a large number of places I was simply copying and pasting existing code into a new file for my project, especially when it come to our build scripts. Rather than be able to add the new solution names to a list, and the build scripts take care of everything else internally, I was changing the solution name multiple times within each build script. It was clear to me that this duplication was unnecessary and that it could be eradicated making production of future build scripts much simpler. This is the next waste task that I will take on when the chance arises.

Copying and pasting a file and changing the solution name in ten places or so is not a big issue and does not take that long especially if you are only creating a single solution. Because I am doing this task twice I am trying to think of things that are going to have to be duplicated when the second solution is created. The fact that these files are all duplicated means that if we ever want to change something across all of the build scripts then we would have to make the change in all of our build scripts, thus resulting in around 25 changes in 25 separate files. As the content of these files is duplicated then surely it makes sense to have the content encapsulated in a single place. Once this code is encapsulated then any future changes to the build scripts would result in only a single file having to change.

We are very careful at Codeweavers that we do not duplicate anything within our core C# code base as this is not good code design and can lead to major problems. However, we have been lapse in the past when it comes to duplication in other code that we write, such as the build scripts, stored procedures etc. This duplication is something that we now want to remove. The deploy scripts and stored procedures were written a long time ago when we did not know any better, but now we do!

Along with the deploy scripts I also came across a bunch of files that were required for deploying the solution to our demo and live servers. This time the duplication was extreme as there was around 80 files of duplication, each of the files containing the same information for MS Deploy. The files that we use for MS Deploy are the same and are not application dependant so why have we got 80 different files containing the same information. Moving the information into a single place produces clear benefits as from now on if we need to change anything for MS Deploy then we only have to change a single file rather than 80.

My goal was to produce documentation that could then be used by myself and the rest of the team to try and reduce the number of manual steps to as few as possible when creating a new solution. The only way to do this is through the reduction of duplication. When I had completed the documentation there were eleven manual steps that had to be taken to get the solution from a local machine through to the demo and live servers. During the production of the second solution I took steps to minimize the number of steps that we would have to take. This process is now down to nine manual steps and I will continue to look into reducing this number further in the future when opportunities arise.

Now don’t get me wrong duplication can be useful as the work that I am currently working on is creating a new web service for a new client of ours. This web service is an exact copy of one used by an existing client of ours. We are having to duplicate the web service as the existing clients web service was written around seven years ago and we did not know better than to hard code the web service to the existing client. The decision to duplicate the web service was made by the team as this meant that the existing web service would continue untouched while we created a generic web service for the new client. Once the new web service has been pushed out to live we will then remove this duplication by combining both of the web services to use a single web service that could be used by n number of clients. It is important that when duplication like this is added then it is removed as soon as possible, otherwise it will get forgotten and this will cause problems later on.

Duplication in your code, will come back to haunt you and will cost you and your business time and money. Just remember the longer duplication is left the harder it becomes to remove it. More and more code is built on top of the duplication so pull it out sooner rather than later or you will regret it.

Software deployment logging and the unexpected benefits

by

Codeweavers, as any previous blog post readers will know are an agile software house. As an agile software house we offer our clients a fast turn around and as part of this we have to deploy multiple times a day. While some people believe that this is a bad thing, we see it as a hugely positive thing. We cannot deny that we have been caught out by it once or twice but the advantages to the client of deploying often outweigh the rare occasions that we have issues. On the rare occasions that we do get caught by it, we sit down and come up with a solution to try and prevent the problem ever reoccurring. During the last issue we had with a deploy, we found it difficult to track down the issue as it did not show itself immediately. In the end it turns out that a number of deploys had gone out not long before the issue showed itself and and we found it difficult to track down which of the deployed services contained the problem. The main issue we had while trying to track down the issue was due to not knowing which services had been deployed around the time that the issue occurred.

A few weeks before this issue I had watched a great video regarding Facebook’s deploys and their disaster recovery (Definitely worth a watch so here’s a link). This video contained a number of good ideas that we could implement at Codeweavers, the main one being a log that Facebook has of every major event which takes place (deploys, data imports etc). After watching this video I wanted to introduce a similar log into Codeweavers, alas I never found the time to do this and this ended up biting us as the deploy log would have benefited us greatly in tracing which services had been deployed around the time the issue showed itself. After we had resolved the issue the deployment log became my number one priority.

The deployment log is stored in a database and we just have a small web page to allow us to filter the deploys in an effective way for us. We felt that it would be useful to be able to sort the deploys by environment, date and application. Below is a screenshot of the web page we have to allow us quick access to the data rather than having to query the database everytime.


Deployment log web page

Now that we have been logging the deploys for over a month and a half we are finding unexpected benefits. As well as been useful in seeing when each of our services has been deployed we have also been able to pull some statistics regarding our deploys. It turns out that in the 23 working days of May we deployed to our live servers a total of 143 times which is around 6 times a day. This is a much larger number than anyone within the business thought (the majority of estimates were around the 50 mark). As well as these deploys we have also deployed to our demo servers 314 times in the month of May. This shows that we are adapting our code base a huge amount each day and pushing this functionality out to our customers at a great pace.

This information can now be used to monitor our deploy rate and we are now recording any issues that we have during deploys, so that we can see if there is a correlation between the number of times we deploy and the number of issues we experience. We hope that through the use of this log and further tools that we are continuing to develop around the deploys, we will be able to shrink the amount of time that it takes to fix an issue. All of this is being done with the view of getting our deploys to a place where we know that there will never be an issue when we deploy.

Why are you not using Design by Contract?

by

When learning to program I distinctly remember coming across the concept of placing asserts within your code. Assert statements are primarily used for “things that cannot happen“, but in my early days I was too focused on the stuff that was supposed to happen!

Defensive programming” was also introduced. Principles such as “Never trust the user” and “80% of your code will be validation and verification” were highlighted. Despite these introductions many years ago, the concept of asserts never stuck with me. Yet I program defensively like there is no tomorrow.

The use of asserts can be extended into “Design by Contract” or DBC. In DBC the developer makes use of pre-conditions, post-conditions and invariants. Some languages such as Effiel have taken DBC as a core feature while other languages leave DBC up to libraries.

One of my favourite programming books is the Pragmatic Programmer. Having stood up to many re-reads I always found myself intrigued by the idea of DBC. Yet I never found myself following this interest through, at least in a production environment.

Our team recently came across a bug in which part of the system was using a component in a way which was deemed invalid. We had a suite of tests to accompany this feature, but these tests were unable to highlight the problem. When the object was sent across the wire, the Javascript front end was firing a null reference across, this was out of our control in the back end of the application. As the feature crossed a boundary and spoke to another system defensive programming would have been difficult. All we could do was error and inform the developer what was wrong. Even without defensive programming, the system was currently doing this anyway. We had little to gain.

Here I decided to experiment for the first time in my programming career with code contracts. A contract was applied that said the collection sent into the system must not be null or empty. If so, the second system would blow up informing the developer what was wrong. This contract was a very primitive example of a pre-condition – something that must be true in order for the rest of the following code to execute.

The benefit here came from just a few mere lines of code. Had we tried to program defensively the second systems’ code base would have suffered for little gain. We would need to report the error, add error codes, introduce exception handling and so on, all for a simple defect that could be fixed immediately and potentially never occur again once the developer integrating has configured the components correctly.

One important factor to consider with DBC is the contract violations should never be caught or handled. Every single contract that is violated is a bug. To stop the violation you need to fix the code that is breaking the contract. Likewise contracts make little sense when dealing with a public API. On the edge of the system you should presume your users will make mistakes and “do the wrong thing“, here you must use defensive programming.

Since this day I’ve liberally applied code contracts whenever we cross system boundaries or interact with the infrastructural aspects of our code, e.g. database helpers. This has increased my confidence that the system as a whole has been correctly “glued together”. Another benefit is several bugs have been thwarted thanks to the contracts as unlike unit tests, contracts are always present when enabled, meaning missed boundary conditions can easily be detected.

Hand in hand with our automated test suite, code contracts make a great companion. Never alone will one suffice, but when used in conjunction they can be extremely powerful. So the question is, why aren’t you using them?

3 years at Codeweavers

by

Having wrote about the top ten things I discovered in my first year at Codeweavers, I figured it would be time for a follow up after the past two years. In no particular order, a collection of the biggest lessons I have experienced.

  • Design by Contract
  • Test Driven Development (TDD) is a tool
  • Design is Important
  • Don’t tie yourself to a Framework
  • The Importance of Tools
  • Acceptance Testing need not use the Full Stack
  • Program for Change (Open/Closed Principle)
  • Reinvent the Wheel, Often
  • Do it right – violate YAGNI
  • Practice, Practice, Practice

I’ll expand on these topics over time in future posts.

The Anti If Campaign

by

Firstly if you are unaware of what the Anti If Campaign is, I advise you to take look before coming back. My first impression a few years ago was the site must have been some sort of spoof. Programming without “if” statements, this was crazy nonsense. After all the “if” statement is one of the core constructs of any language. If you look deeper however the campaign is not advocating the abolition of “if” statements, it is simply encouraging cleaner code by removing the likes of type checking and control coupling. This can be achieved by the use of Polymorphism and abiding by the Single Responsibility Principle (SRP).

The Anti If Campaign is relevant as I have recently had first hand experience of what the supporters are campaigning against. I was working on one of our greenfield projects where I had violated SRP for an easy win. We had a class which would look up a quote based on some input criteria. I allowed this input to control how the lookup was performed. In some scenarios the input would be in a different form, meaning the lookup would need to be carried out in a different manner. An “if” check was introduced to handle this logic. In pseudo code:

public Result Lookup(Request request)
{
    if (request.Id != null)
    {
        // look up an existing quote
        // code specific to an existing quote...
    }
    else
    {
        // look up a new quote
        // code specific to a new quote...
    }
}

The code in question had supporting methods for both paths.

Fast forward a few months and something terrible had happened. Like a plague, this simple conditional I had introduced was spreading. Code that was executed much later on was beginning to perform the same conditional check! At the same time I discovered this problem, I was asked to perform a trivial change as the requirements had evolved. What should have been a five minute job, turned into a few hours of paying back technical debt.

The fix was well overdue at this point. I had to push the conditional statements as high as I could. The closer they were to the edge of the system the better. The by product of this refactor is that the code is a lot clearer now. Each class and method did just one thing, and they did it well. It turned out I was actually able to push the conditional statement so far up that it effectively disappeared into the routing of the system. It was up to the caller to “do the right thing“.

After the refactor:

public Result LookupExisting(Request request)
{
    // lookup an existing quote
    // More code for an existing quote...
}
public Result LookupNew(Request requst)
{
    // Look up a new quote
    // More code for a new quote
}

As each part of the code complies with SRP, I know exactly where to go if there is a problem. For example, if we have any problems with the retrieval of new quotes, I can easily debug and fix the issue. Likewise if we wish to extend the lookup of existing quotes, I can confidently change the code without the fear of breaking the retrieval of new quotes. The other side effect is that I can easily reason about and test the code in question.