The Perils of Duplication

by

A few weeks ago, the stream that I was working with was tasked with the job of creating two new C# solutions for upcoming work. The task involved creating the two solutions and then deploying them through demo and live ready for the projects to be worked on. The first solution was created by myself, while my colleague worked on speeding up our internal builds. As I had never written a deploy script before I knew that this task would be a great learning curve for me. There was no documentation outlining how to create a new solution within Codeweavers so I decided that this was something that I would invest time into, whilst creating the solution. The first solution required research as I had no clue how much work was involved to get the solution up and running. I decided to create a single solution to start with along with the documentation. This documentation would then be followed in the production of the second solution. This would allow me to fine tune anything that was unclear, and add any little steps that I had missed.

Whilst creating the first solution, I found that in quite a large number of places I was simply copying and pasting existing code into a new file for my project, especially when it come to our build scripts. Rather than be able to add the new solution names to a list, and the build scripts take care of everything else internally, I was changing the solution name multiple times within each build script. It was clear to me that this duplication was unnecessary and that it could be eradicated making production of future build scripts much simpler. This is the next waste task that I will take on when the chance arises.

Copying and pasting a file and changing the solution name in ten places or so is not a big issue and does not take that long especially if you are only creating a single solution. Because I am doing this task twice I am trying to think of things that are going to have to be duplicated when the second solution is created. The fact that these files are all duplicated means that if we ever want to change something across all of the build scripts then we would have to make the change in all of our build scripts, thus resulting in around 25 changes in 25 separate files. As the content of these files is duplicated then surely it makes sense to have the content encapsulated in a single place. Once this code is encapsulated then any future changes to the build scripts would result in only a single file having to change.

We are very careful at Codeweavers that we do not duplicate anything within our core C# code base as this is not good code design and can lead to major problems. However, we have been lapse in the past when it comes to duplication in other code that we write, such as the build scripts, stored procedures etc. This duplication is something that we now want to remove. The deploy scripts and stored procedures were written a long time ago when we did not know any better, but now we do!

Along with the deploy scripts I also came across a bunch of files that were required for deploying the solution to our demo and live servers. This time the duplication was extreme as there was around 80 files of duplication, each of the files containing the same information for MS Deploy. The files that we use for MS Deploy are the same and are not application dependant so why have we got 80 different files containing the same information. Moving the information into a single place produces clear benefits as from now on if we need to change anything for MS Deploy then we only have to change a single file rather than 80.

My goal was to produce documentation that could then be used by myself and the rest of the team to try and reduce the number of manual steps to as few as possible when creating a new solution. The only way to do this is through the reduction of duplication. When I had completed the documentation there were eleven manual steps that had to be taken to get the solution from a local machine through to the demo and live servers. During the production of the second solution I took steps to minimize the number of steps that we would have to take. This process is now down to nine manual steps and I will continue to look into reducing this number further in the future when opportunities arise.

Now don’t get me wrong duplication can be useful as the work that I am currently working on is creating a new web service for a new client of ours. This web service is an exact copy of one used by an existing client of ours. We are having to duplicate the web service as the existing clients web service was written around seven years ago and we did not know better than to hard code the web service to the existing client. The decision to duplicate the web service was made by the team as this meant that the existing web service would continue untouched while we created a generic web service for the new client. Once the new web service has been pushed out to live we will then remove this duplication by combining both of the web services to use a single web service that could be used by n number of clients. It is important that when duplication like this is added then it is removed as soon as possible, otherwise it will get forgotten and this will cause problems later on.

Duplication in your code, will come back to haunt you and will cost you and your business time and money. Just remember the longer duplication is left the harder it becomes to remove it. More and more code is built on top of the duplication so pull it out sooner rather than later or you will regret it.

Software deployment logging and the unexpected benefits

by

Codeweavers, as any previous blog post readers will know are an agile software house. As an agile software house we offer our clients a fast turn around and as part of this we have to deploy multiple times a day. While some people believe that this is a bad thing, we see it as a hugely positive thing. We cannot deny that we have been caught out by it once or twice but the advantages to the client of deploying often outweigh the rare occasions that we have issues. On the rare occasions that we do get caught by it, we sit down and come up with a solution to try and prevent the problem ever reoccurring. During the last issue we had with a deploy, we found it difficult to track down the issue as it did not show itself immediately. In the end it turns out that a number of deploys had gone out not long before the issue showed itself and and we found it difficult to track down which of the deployed services contained the problem. The main issue we had while trying to track down the issue was due to not knowing which services had been deployed around the time that the issue occurred.

A few weeks before this issue I had watched a great video regarding Facebook’s deploys and their disaster recovery (Definitely worth a watch so here’s a link). This video contained a number of good ideas that we could implement at Codeweavers, the main one being a log that Facebook has of every major event which takes place (deploys, data imports etc). After watching this video I wanted to introduce a similar log into Codeweavers, alas I never found the time to do this and this ended up biting us as the deploy log would have benefited us greatly in tracing which services had been deployed around the time the issue showed itself. After we had resolved the issue the deployment log became my number one priority.

The deployment log is stored in a database and we just have a small web page to allow us to filter the deploys in an effective way for us. We felt that it would be useful to be able to sort the deploys by environment, date and application. Below is a screenshot of the web page we have to allow us quick access to the data rather than having to query the database everytime.


Deployment log web page

Now that we have been logging the deploys for over a month and a half we are finding unexpected benefits. As well as been useful in seeing when each of our services has been deployed we have also been able to pull some statistics regarding our deploys. It turns out that in the 23 working days of May we deployed to our live servers a total of 143 times which is around 6 times a day. This is a much larger number than anyone within the business thought (the majority of estimates were around the 50 mark). As well as these deploys we have also deployed to our demo servers 314 times in the month of May. This shows that we are adapting our code base a huge amount each day and pushing this functionality out to our customers at a great pace.

This information can now be used to monitor our deploy rate and we are now recording any issues that we have during deploys, so that we can see if there is a correlation between the number of times we deploy and the number of issues we experience. We hope that through the use of this log and further tools that we are continuing to develop around the deploys, we will be able to shrink the amount of time that it takes to fix an issue. All of this is being done with the view of getting our deploys to a place where we know that there will never be an issue when we deploy.

3 years at Codeweavers

by

Having wrote about the top ten things I discovered in my first year at Codeweavers, I figured it would be time for a follow up after the past two years. In no particular order, a collection of the biggest lessons I have experienced.

  • Design by Contract
  • Test Driven Development (TDD) is a tool
  • Design is Important
  • Don’t tie yourself to a Framework
  • The Importance of Tools
  • Acceptance Testing need not use the Full Stack
  • Program for Change (Open/Closed Principle)
  • Reinvent the Wheel, Often
  • Do it right – violate YAGNI
  • Practice, Practice, Practice

I’ll expand on these topics over time in future posts.

The Problem with Auto Updating Browsers

by

At the time of writing the latest version of Firefox (version 13) has just been released. Bare in mind that a week ago I updated our Selenium bindings so that we could use Firefox 9+ for running our browser tests.

The latest release is another great release for the Firefox team, except there is software out there will be broken. The software in question I’m talking about is any code that uses Selenium 2.22.0 that was released 2012-05-29. It turns out the bindings only work for Firefox 12 or less.

For whatever reason any tests that used Selenium this morning just stopped working for us – and others. The tests in question caused the runner to hang as no window could be opened. I’m not sure what causes this, as the browser is essentially the same to the end user, bar some new features. Not being a Selenium developer I cannot comment how or why this has happened, nor can I suggest the Selenium team should be version agnositc.

Our solution in the end was simple. Turn off the auto updating and downgrade the browser. I’ve blogged about this in the past, but since Firefox 10 – the team are adopting a “silent” update process. This is great for end users. Imagine the countless man hours saved if IE6 had shipped with an auto update feature? The problem now seems to be in the hands of developers.

Another attempt to make this problem more obvious has been to add a check prior to our tests running to ensure that it can open a window. If this fails or hangs, we display a useful error message indicating that the browser in question is not compatible. This is due to the fact that it is not immediatley obvious what the problem is. More confusion occurs when some machines will execute the tests with no problems at all.

Tools -> Options -> Advanced -> Update Tab

how to turn off updates in Firefox

So if you use Selenium and Firefox – ditch the auto updating. Manually update your bindings and check compatability for now…

Recursively building a Web Service – using the same Web Service

by

The definition of recursion

Back during the later part of 2011 there was a common theme occurring in our retrospectives each week. How can we replicate our live environment as close as possible?

We took steps to achieve this goal by creating a single machine image to ensure all our machines were configured correctly. Another quick win was to ensure certain aspects of our live data was restored to our local development databases during the night. This enabled us to take stack traces from our logs, and quite literally paste them into our IDE and replicate the users problem instantly. Without the same data set we could have seen different results. Despite these positive steps, there was a missing link in our replication process. How do we simulate the traffic of our live environment? As an example, we average anywhere from four to five thousand calculations per minute with our current web services, with our local and demo environment no where near this figure.

During 2011 I found myself involved in many deployments in which despite heavy testing I was uneasy. On our demo environments we could throw the same amount of load against our services, yet sometime after deploying our service would fall over. We would quickly have to revert and go back to the drawing board. The problem we had despite our traffic being mimicked in terms of volume was the load was not real. Our customers however have many more variations of requests that we were simply not predicting. The other obvious issue was during local development, the service may well handle the same volume of traffic, yet once live and the process has been running for a few hours – things might go bump. Factors such as memory or timeouts being the culprits here.

Collectively we had a few ideas on how to solve this. We looked into low level solutions such as directing traffic from IIS/apache towards other servers. We examined other load testing tools, and we even contemplated creating our own load creator. This internal tool would go over our database and fire off a number of requests at our demo environment. I felt uneasy with all these solutions. They were not “real” enough. I wanted the real time traffic to be submitted to our demo services, only then could we have full confidence in our work.

My idea was rather radical in the sense it was so easy, yet dangerous enough that it might just work. I proposed we integrated our own service, into itself. In other words, just before our service returns the results of the calculation, it takes the users request and submits it again, against our demo environment. The same service would be recursively submitting into itself. In order to ensure we did not affect the speed of the service, the submission is performed via an async call, meaning if this second call was to die the live service would be unaffected. The obvious downside here was that in order to test this, we needed to deploy the changes to our live service. This was achieved via a feature toggle, meaning at any time we could turn the feature on or off without affecting any customers.

The end result of this was that when the feature is enabled, the traffic on our live service is sent to our demo service. This allows us to deploy experimental or new features and changes to the demo environment and check them under real load, with real time data. If all goes well after a period of time we can deploy to our live service, if not we roll back and no one is the wiser.