The following is a review of a presentation given at FLOSSUK Spring DevOps Conference. The event was held at Mary Ward House in London between 15th-17th March 2016. Although this is a description of a presentation with some dissection of what was discussed it is not a verbatim account and will contain personal impressions and interpretation. The content therefore does not reflect the quality of the original presentation and should be considered a review and personal opinion.
This is one of a series of reviews of the talks I saw at the event.
The talk from Matt was first presented two years ago but Matt recently updated it and delivered it for us at FLOSS Spring. The talk is still relevant as it is a series of steps for good automation practices when you come to automate a system not a discussion of a recent piece of software.
When you’re up to your ass in alligators it’s hard to remember you were supposed to be draining the swamp
Most of the talk was delivered as an homage, and dedicated, to a man called Wild Bill Walton, originator of the above quote, who was a man who in Matt’s words:
Managed to deliver a metaphor that both managers and technical people could understand…[and was a]...master of the folksy metaphor
Automating for the Blindfolded
The premise of this talk is that you may have to automate a server in your existing company, or a new company, or an acquisition, where there has been no accurate record of what has been placed on it. Even if you have a good development environment, use virtual machines and containerisation for segregation of services, it is still possible that you may have to do forensic analysis of someone else’s work. It is also possible that you are working as a start-up and don’t have the option to do things ‘the right way’ in the first instance:
- Build customers first
- Build technical debt - this is inevitable and is not ‘bad’
- Refactor when you have the time and funds
- Don’t 2nd System yourself up-front
The ideal situation is to be in profit and at a period of relative stability. You may have rotated through some developers and iterations of codebase and have systems that now are starting to be the blockage in your evolution. Now we:
- Figure out what systems you have
- Figure out what services are where
- What’s installed
- What’s currently running
To do this we start by asking the Operating System and then work out the custom code. One thing to be wary of is if someone did development on the production machine that isn’t replicated anywhere else. You can start by doing a snapshot of binaries over 24hrs, this is ‘dirty’ but very revealing. Use:
- ps ax
- netstat * All daemons all files * all services
You can then export the data to a text based format for introspection. A good technique is to use a service/system such as mediawiki or git that have import tools and formatting tools to help you format and compile the text you have as output.
Grep everything for IP addresses
Don't Be Clever
We should by now have some idea of how the whole of the system is composed, what files we have, what is running and where. The next stage is to set up a new, clean, staging server. The new server will give a clean controlled environment. The other good reason is that there are likely to be updates, at least security fixes, to apply to the system and a clean install will aid with that.
Firewalls are not just for security, they control what connects and force transparency
It is essential that you use configuration management even on small systems, some automation is better than no automation. The recommendation that Matt gave is that Pull-Based systems are preferred:
don’t try to be clever, systems should not be smarter than you
- eliminate IP-based configs.
The next step is to look at the DNS, if you are lucky this is only going to be a small mess. Getting DNS right is going to save you a lot of issues in the refactor:
- Backup everything
- Check that it is backed up
- Clean all machines and make sure the config management is clean
- install backup
- point the dev servers to the correct staging
- make a change on dev, push to staging
- check slave systems
Make sure you have a written, repeatable and known migration strategy.
Make sure it is Predictable, Repeatable, Stupid, Understandable - don’t try to be clever, systems should not be smarter than you. Don’t trust DNS TTL to be honoured by other systems: keep it simple, keep it stupid, keep it one Alligator at a time
[Don't forget that you can join in this conversation by using the comments form or by tweeting at @shadowcat_mdk]