The Config Bomb

For some time now I've been using the term "Config Bomb", and I've managed to find a number of former colleagues who are familiar with the term.

However, none of us can recall where we first heard the term, and search engines don't appear to have any record of such a term existing.

To try and help make the term more widely known, I decided I'd write something down. Hopefully someone sees this and is able to provide us with a primary source.

Config Bomb - a modification to a configuration file that has yet to take effect. At some later time a restart or a reload will cause the bomb to go off, and the change will kick in - sometimes to surprising or even disastrous effects.

This can happen when someone makes a manual change, or also when configuration management tooling isn't configured to automatically restart services when dependent files are modified.

If I was coming up with this term today I'd probably try and go with a less war-based metaphor. maybe something like a coiled spring?

Anyway, this is now written down for (I hope) posterity.

Adding custom typescript 3 definitions into your local project

Sometimes the @types package for an npm module you're using isn't up to scratch for some reason. Maybe you're waiting for a pull request to be merged, or you simply don't like the one that's up there. I was trying to figure out what the easy way of adding module definitions into your local project is.

Lots of examples out there on the internet have you use the declare module "xxx" syntax, but this doesn't allow you to reference relative files, so you can't just copy-paste from many @types packages.

After a bit of trial and error, I found this github issue which succinctly describes how to do exactly what I was after: https://github.com/Microsoft/TypeScript/issues/20421

  1. npm remove @types/module-name
  2. Create typings/module-name/index.d.ts
  3. Populate the type defintion file with what you want, no need for a module wrapper
  4. Configure the baseUrl and paths options in compilerOptions to add typings/* as a second lookup location.
  5. Success!

Shortly after writing this I stumbled across a blog post titled Maintaining overridden type definitions for a dependency with TypeScript which covers the same topic, I'm linking to it here in the hope that this helps improve the search result rankings for either of us.

Generating a secure sha512 crypt() / htpasswd / passwd hash

The /etc/passwd hash system, as well as htpasswd files used by apache and nginx all use an underlying system call called crypt to generate and verify secure password hashes.

Attempting to generate these hashes programatically is a bit of a nightmare for some reason - and googling mostly gets you terrible results.

Here is the simplest portable approach I'm aware of to generate hashes.

python -c "from passlib.hash import sha512_crypt; import getpass; print sha512_crypt.encrypt(getpass.getpass('clear-text password: '))"

Depending on where this is being checked, you might need to alter the number of rounds. The default setting is suitable for a unix password, but not great for an HTTP basic auth password as it takes around 500ms to check.

print sha512_crypt.encrypt(getpass.getpass('clear-text password: '), rounds=5000)


Credit goes to Danny for finding this.


Names, Values, Identities, States and Time

“No man can cross the same river twice, because neither the man nor the river are the same.”

Heraclitus

The following post is extracted & paraphrased from Rich Hickey's excellent Are We There Yet? - specifically the section of the talk that focuses on the model for discussing and thinking about the titular concepts. These concepts are in turn taken from the philosophical writings of Alfred North Whitehead (a co-author of Principia Mathematica).

I often find myself wanting to explain this core concept to people who are new to Clojure, and particularly people who I am trying to make into people who are new to Clojure. While I think I have a good handle on this concept in my head - I sometimes struggle to explain it succinctly, hopefully this post achieves that goal.

Definitions

These definitions are not really globally applicable, but they represent the precise meaning I try to capture when discussing values changing over time in the context of software development and programming.

Value

A value is some measurable quantity, magnitude or amount for which its equivalence to some other value can be determined. A value can also be some immutable composite of other values. The number 42 is a value, the string "Hello World" is a value, the list (1, 2, 3) is also a value.

Identity

An identity is some composite psychological concept we apply to a succession of values over time where they are in some way causally related. You are an identity, I am an identity, a river is an identity (see below).

Name

A name is simply a label that we apply to an identity or a value to make it easier to refer to. The same identity can have many names. "Glen", "Glen Mailer", "Mr Mailer", "Glenjamin" and "@glenathan" are all names which could legitimately be considered to refer to the identity that is myself in the appropriate context. Likewise the "Answer to the Ultimate Question of Life, The Universe, and Everything" is a name for the value 42.

State

A state is simply the value of an identity at a particular time. A snapshot, if you will. Under this definition state does not change, and thus there is no such thing as "mutable state".

Time

Purely relative, it has no dimensions - it can only tell us whether something happened before or after some other thing (or at the same time).

The River

Let us consider the title quote in the context of these definitions. To help us examine the proverbial river under this light, we shall give ourselves the same powers as when running a computer program but in the real world - which requires us to sidestep some fairly fundamental physics - hopefully this will not cause any lasting damage.

The third-longest river in Asia runs through China. Depending on context it is known as the "Yellow River", "Huang He", "རྨ་ཆུ།", "the cradle of Chinese civilization" and "China's Sorrow". All of these are names for the same river, which itself is an identity.

If we were to freeze an instant in time into a snapshot of our proverbial river crossing, this state would contain a value composed of a large number of atomic (in the irreducible sense) smaller values. For simplicity, lets assume that water molecules are immutable. In this case then the state of the river we are crossing can be said to be the current arrangement of all these immutable water molecule values.

At some point in the future when returning for our second crossing, we take another snapshot of the river as our new state. The river's value is again the arrangement of all the immutable water molecules - but this time they are all different molecules with different values.

The state of the identity which is the river named "Huang He" at this later point in time is measurably different from the value we took during the first crossing.

In Clojure

Since immutability is at it's core, we'll start here for some code examples.

The following code should work correctly when pasted into a running Clojure REPL.

In JavaScript

JavaScript doesn't have the same set of immutable primitives, but we can achieve a similar effect with a little sleight of hand.

The following code should work correctly when pasted into the browser console or a Node.js REPL line-by-line.

Summary

The ancient Greeks knew about the perils of mutable state a long time ago - we're only now rediscovering this for ourselves.

In a language like Clojure, that was designed from the ground up with this in mind, it's easy to take back control and tease apart the separate concepts I've described. Even in a language like JavaScript, designed in a week at a time when mutability was commonplace, we can achieve a similar effect with a measure of self-control. There are also libraries like mori and immutable-js which provide much fuller implementations of the data-types required to avoid mutability.

If you remain unconvinced, I recommend watching Are We There Yet?. If you're still not sure after that, you're either a far better programmer than me, or you're yet to experience the pain of trying to reason about a large codebase riddled with unchecked mutability.

Addendum

As well as the above definitions, Are We There Yet? contains this gem, which is Rich visualising the idea of obvious complexity while saying "Grrrrr".









Stuff to Follow Up From EuroClojure 2014

So I've just spent the last two days at EuroClojure, which was excellent. I met plenty of really great, really friendly and really smart people. It gave me plenty to think about, and inspired me to try and write and share more about my own experiences with Clojure.

To kick this off, for my own benefit as well as any readers, here follows a list of everything I made a note of to look into further, read or watch.

For a fuller set of notes from the conference, be sure to check out Phil Potter's notes.

Day One

  • Fergal Byrne: Real Machine Intelligence with Neuroscience
  • Logan Campbell: Using Clojure at a Post Office (Australia Post)
    • Vlad for validations
    • core.typed for optional type checking
    • "Show people code to sell the benefits" - on Clojure vs Scala (or vs Anything else)
    •  http-kit's async http requests + core.async for simple lightweight concurrency when calling downstream services
    • On a new project, engage very early with the Systems/Ops team
    • Get something delivered end-to-end as early as possible: pave the way for more
    • App Dynamics
    • metrics-clojure
  • Tommy Hall: Escaping DSL Hell
    • If you're writing a language, be sure to actually design it
    • When you invent a DSL, people will want loops
    • You're better off embedding in a real language than inventing your own
    • Geomlab - DSL for learning, but dead-end as skills not directly transferrable
    • CLJSFiddle
    • Incanter - example of a great DSL embedded in clojure
    • Try doing SICP in ClojureScript
    • ClojureScript (and Clojure) need a much better day zero experience
  • Paul Ingles: multi-armed bandit optimisations
    • Its far easier to compare relative values, than to evaluate absolute values
    • Multi-armed bandit is about exploration vs exploitation
    • Thompson sampling models results so far into a probability distribution used to select the next value
  • Tommi Reiman: JSON APIs
  • Rich Hickey: Some core.async internals
    • Channels are just a conveyor belt
    • Use put! to throw stuff into a channel from the main thread
    • There are no unbounded buffers allowed. at all. ever.
    • Channels have a hard limit of 1024 queued items without buffers
  • Hallway Track

Day Two

  • David Nolen: Invention & Innovation
  • Phil Potter: test.check
  • Chris Ford: The Hitchhiker's Guide to the Curry-Howard Correspondence
  • Anna Pawlicka: Reactive data visualisations with Om
  • Malcolm Sparks: Assembling secure Clojure applications from re-usable parts
    • Liberator for building web APIs
    • bidi for declarative routing - as opposed to function composition, which cannot be reversed into URLs
    • Modular: an experiment in meta-architecture for more reusable components
    • There seems to be an inherent tension between dynamism and late binding, and composable modularity (which involves some form of encapsualtion)
  • Hallway Track


If nothing else, I think I've realised I need to be more consistent with my note taking - apologies to any speakers I didn't make notes about!

Data Provider Docstrings

For a tl;dr, skip to the final gist.

I'm a big believer in BDD-style testing. And by that I mean testing the behaviour of your code, and expressing your tests in those terms.

In PHPUnit, this tends to mean that instead of having a test outline like this:

  • EventMatcherTest
    • testConstructor
    • testSetEvents
    • testGetEvents
    • testMatchEvent
    • providerMatchEvent

You have something that looks more like this:

  • EventMatcherTest
    • test_matchEvent_does_exact_name_match
    • test_matchEvent_does_alias_name_match
    • test_matchEvent_does_shortname_match
    • test_matchEvent_falls_back_to_closest_match
    • test_matchEvent_ignores_different_dates
    • test_matchEvent_doesnt_match_wrong_way_around
    • test_matchEvent_doesnt_match_if_nothing_similar

However, this week I was editing a test someone else had written that used a dataProvider, I tend to stay away from these because it can often be a bit opaque as to what each data set is actually testing. However, I don't dislike them enough to merit completely re-writing someone else's reasonably good passing test.

I needed to add an additional case to this data provider, but it wasn't entirely clear which cases had already been covered. I went through adding a comment to each one, so they looked like this:

But then I realised that I could turn the comments into code, by setting the array keys!

The biggest benefit of this approach is that instead of getting test failure output like this:

1) EventMatcherTest::testMatchEventsSucceeds with data set #0 (array('Man Utd', 'Arsenal'), 'Man Utd vs Arsenal')

You get output like this:

1) EventMatcherTest::testMatchEventsSucceeds with data set "Exact match" (array('Man Utd', 'Arsenal'), 'Man Utd vs Arsenal')

These strings also appear in the TAP and junit formatter output, and can be used with the --filter switch.

Top tips for distributing work with queues

First off, lets get one thing straight: Message Queues are awesome.

They allow you to decouple the various parts of your application form each other, and communicate asynchronously. They protect you from data loss during restarts, and give you a an excellent visualisation of processing bottlenecks in your system.

It all starts by passing a message from A to B, via a queue Q.

You become enamoured with this approach and the flexibility it offers, running multiple instances of A and B and scaling to your heart's content.

A requirement for a new data-source pops up, and you think "aha!" - my queue shall save me, I'll just write a new process C, and have it publish messages to B via Q. This works wonderfully, and you sleep soundly at night.

Many months later you've run out of single-letter monikers for your applications, and many thousands of messages flow through your sturdy queues every second of the day. A sends to B via Q which sends to F via R which sends to G via S which sometimes sends back to A via T. You've got passive and active monitoring, graphing and alerting on the size and throughput of those lovely, lovely message queues.

And then one day a process logs an error: "Hey, this message is a bit dodgy - I can't handle this!". It includes enough debug information for you to see that yes: it is certainly a dodgy message, and failing to handle it is the correct course of action. But how on earth did it get there?

If you're any of the way along the journey above, these tips are for you.

Tip #1 - Identification

When a new message is assembled, give it a globally unique identifier.

You can store this in AMQP's message-id field.

Tip #2 - Birthday

When a new message is assembled, give it a timestamp.

You can store this in AMQP's timestamp field. Be as accurate as you can, but know that computers rarely completely agree on time - so don't overly worry about this.

Tip #3 - Source

When a new message is assembled, note which application is assembling it - and ideally why.

You could encode this into the message-id, or some combination of app-id, user-id and type.

Tip #4 - Ancestry

This is the really important one.

When creating a new message as a result of an existing message, include the data from tips 1, 2 and 3 in the new message.

Depending on the properties of the new message, you can use the same fields or use the headers field with names like parent-message-id or source-message-id - or even as part of the message body in some way.

Tip #5 - Tracking

Throughout the lifecycle of messages in your system, include the source-message-id in your log lines. This allows messages to be correlated to the event that spawned them.

Tip #6 - Lifetime

When an action is performed, log the time delta between the source-message-timestamp and the current time.

Network time means this isn't necessarily exact - but it a useful indicatation of the health and throughput of the system. We like to refer to this as "message lifetime".

Tip #7 - Factory

Create a small standalone module that exports only factory functions for creating messages in the format that will be used amongst your queues.

Give this module a version number, document it well, and then treat it as a third party dependency of any application that interacts with a message queue.

Why?

I've recently been attempting to debug issues in a system with a myriad of processes that interact via message queues. There are many desirable properties of this system - particularly around fault tolerance and horizontal scaling.

Many of the bugs I see are either

  1. Very hard to find the root cause of
  2. A result of very slightly inconsistent message contents

The tips above are intended to make it far easier to such resolve issues. The lifetime thing is just a nice-to-have.

Shared behaviours in PHPUnit with Traits

One of my favourite features from RSpec is Shared Behaviours - these allow you to include a standard set of tests against a bunch of different Classes.

If you consider an Interface declaration in PHP, it ensures that the implementing class matches the method signatures defined. However, an Interface also comes with some implicitly expected behaviours associated with these methods. We can document what this behaviour is supposed to be - but using a shared-behaviour-like approach we can assert it programmatically in our test suite.

This same logic can also be applied to Abstract classes and Traits. I'm personally not a fan of testing an abstract class by having a Mock object inherit from it, because this isn't how it's actually used in the real system. The fact that a particular class inherits from an Abstract is actually an implementation detail, the only part we care about is that it exhibits the behaviour required.

You can think of a Shared Behaviour as a runtime Interface declaration. An Interface statically states "This class implements this set of functionality", a Shared Behaviour proves "This class implements this set of functionality".

Example

The example we're using is some standard behaviour that we'll use across all of our collection classes. These collections receive their data as an associative array via a JSON API. The API also provides an array of IDs in the order that the collection should be iterated over - this is required because JSON doesn't guarantee the order of elements in its collection type. The collection classes handle converting to appropriate model classes, in addition to iterating in the correct order.

If this was an interface, it would look something like this:

The Plan

  • Write tests for a single concrete implementation of this interface
  • Implement a class which implements the interface and passes the tests
  • Move the standard behaviour specified in the test into a shared example group using a trait
  • Implement a second class which implements the interface by creating a default implementation using an abstract base class

Before abstraction...

We wrap up some of the instance creation boilerplate with some protected methods, but otherwise a pretty ordinary unit test.

If you like, you can read the implementation of this class, but there's not much to it.

Nothing particularly unusual so far, now we'll look at how we can reduce code and test duplication when we have more than one collection class.

Traits to the rescue!

To give you a taster, here's the two test classes after we've moved their shared behaviour into the trait. The second list class works a little differently, but still exposes the same interface. It also includes a higher-level iteration method specific to its own use-cases.

And now for the meat, the trait itself that makes this all work.

Note that this is almost identical to the single test version, but we've extracted the implementor-specific bits into method that the real test class can fill in later.

You can see the full code example, including the abstracted implementations of AppleList and AddressList in this gist.

In Summary

Hopefully the example I've chosen was suitable for conveying the idea. Rather than artificially testing an abstract class, we can structure tests to test the behaviour exhibited. The fact we're using an abstract class to share code becomes an implementation detail, rather than part of the interface contract. In my book, decoupling behaviour from implementation is always a good thing!

This approach should be applicable in any scenario where you have a number of classes exhibiting the same behaviour, regardless of whether they achieve this through inheritance, composition, or even copy-paste!

Thanks to Craig for running into the problem in the first place, as well as doing most of the implementation while I yelled suggestions over his shoulder.

Further Reading



Reload your Terminal with iTerm2 on OS X

After seeing a tweet from @GotNoSugarBaby

I thought "that seems doable", and it turns out it is!

As long as you're using iTerm2, you can bind hotkeys to "Send Text", which allows "\n" for newlines.

Simply open the preferences window (cmd+,) and switch to the "Keys" pane. Add a new global shortcut key as shown in the screenshot above (The original version of this post used "!!\n", but using the escape sequence for up as in the updated version also works in non-bash scenarios).

Now you can "Reload" your last command by bashing "cmd+r" - web-dev browser synergy here we come!