Generating a secure sha512 crypt() / htpasswd / passwd hash

The /etc/passwd hash system, as well as htpasswd files used by apache and nginx all use an underlying system call called crypt to generate and verify secure password hashes.

Attempting to generate these hashes programatically is a bit of a nightmare for some reason - and googling mostly gets you terrible results.

Here is the simplest portable approach I'm aware of to generate hashes.

python -c "from passlib.hash import sha512_crypt; import getpass; print sha512_crypt.encrypt(getpass.getpass('clear-text password: '))"

Depending on where this is being checked, you might need to alter the number of rounds. The default setting is suitable for a unix password, but not great for an HTTP basic auth password as it takes around 500ms to check.

print sha512_crypt.encrypt(getpass.getpass('clear-text password: '), rounds=5000)


Credit goes to Danny for finding this.


Names, Values, Identities, States and Time

“No man can cross the same river twice, because neither the man nor the river are the same.”

Heraclitus

The following post is extracted & paraphrased from Rich Hickey's excellent Are We There Yet? - specifically the section of the talk that focuses on the model for discussing and thinking about the titular concepts. These concepts are in turn taken from the philosophical writings of Alfred North Whitehead (a co-author of Principia Mathematica).

I often find myself wanting to explain this core concept to people who are new to Clojure, and particularly people who I am trying to make into people who are new to Clojure. While I think I have a good handle on this concept in my head - I sometimes struggle to explain it succinctly, hopefully this post achieves that goal.

Definitions

These definitions are not really globally applicable, but they represent the precise meaning I try to capture when discussing values changing over time in the context of software development and programming.

Value

A value is some measurable quantity, magnitude or amount for which its equivalence to some other value can be determined. A value can also be some immutable composite of other values. The number 42 is a value, the string "Hello World" is a value, the list (1, 2, 3) is also a value.

Identity

An identity is some composite psychological concept we apply to a succession of values over time where they are in some way causally related. You are an identity, I am an identity, a river is an identity (see below).

Name

A name is simply a label that we apply to an identity or a value to make it easier to refer to. The same identity can have many names. "Glen", "Glen Mailer", "Mr Mailer", "Glenjamin" and "@glenathan" are all names which could legitimately be considered to refer to the identity that is myself in the appropriate context. Likewise the "Answer to the Ultimate Question of Life, The Universe, and Everything" is a name for the value 42.

State

A state is simply the value of an identity at a particular time. A snapshot, if you will. Under this definition state does not change, and thus there is no such thing as "mutable state".

Time

Purely relative, it has no dimensions - it can only tell us whether something happened before or after some other thing (or at the same time).

The River

Let us consider the title quote in the context of these definitions. To help us examine the proverbial river under this light, we shall give ourselves the same powers as when running a computer program but in the real world - which requires us to sidestep some fairly fundamental physics - hopefully this will not cause any lasting damage.

The third-longest river in Asia runs through China. Depending on context it is known as the "Yellow River", "Huang He", "རྨ་ཆུ།", "the cradle of Chinese civilization" and "China's Sorrow". All of these are names for the same river, which itself is an identity.

If we were to freeze an instant in time into a snapshot of our proverbial river crossing, this state would contain a value composed of a large number of atomic (in the irreducible sense) smaller values. For simplicity, lets assume that water molecules are immutable. In this case then the state of the river we are crossing can be said to be the current arrangement of all these immutable water molecule values.

At some point in the future when returning for our second crossing, we take another snapshot of the river as our new state. The river's value is again the arrangement of all the immutable water molecules - but this time they are all different molecules with different values.

The state of the identity which is the river named "Huang He" at this later point in time is measurably different from the value we took during the first crossing.

In Clojure

Since immutability is at it's core, we'll start here for some code examples.

The following code should work correctly when pasted into a running Clojure REPL.

In JavaScript

JavaScript doesn't have the same set of immutable primitives, but we can achieve a similar effect with a little sleight of hand.

The following code should work correctly when pasted into the browser console or a Node.js REPL line-by-line.

Summary

The ancient Greeks knew about the perils of mutable state a long time ago - we're only now rediscovering this for ourselves.

In a language like Clojure, that was designed from the ground up with this in mind, it's easy to take back control and tease apart the separate concepts I've described. Even in a language like JavaScript, designed in a week at a time when mutability was commonplace, we can achieve a similar effect with a measure of self-control. There are also libraries like mori and immutable-js which provide much fuller implementations of the data-types required to avoid mutability.

If you remain unconvinced, I recommend watching Are We There Yet?. If you're still not sure after that, you're either a far better programmer than me, or you're yet to experience the pain of trying to reason about a large codebase riddled with unchecked mutability.

Addendum

As well as the above definitions, Are We There Yet? contains this gem, which is Rich visualising the idea of obvious complexity while saying "Grrrrr".









Stuff to Follow Up From EuroClojure 2014

So I've just spent the last two days at EuroClojure, which was excellent. I met plenty of really great, really friendly and really smart people. It gave me plenty to think about, and inspired me to try and write and share more about my own experiences with Clojure.

To kick this off, for my own benefit as well as any readers, here follows a list of everything I made a note of to look into further, read or watch.

For a fuller set of notes from the conference, be sure to check out Phil Potter's notes.

Day One

  • Fergal Byrne: Real Machine Intelligence with Neuroscience
  • Logan Campbell: Using Clojure at a Post Office (Australia Post)
    • Vlad for validations
    • core.typed for optional type checking
    • "Show people code to sell the benefits" - on Clojure vs Scala (or vs Anything else)
    •  http-kit's async http requests + core.async for simple lightweight concurrency when calling downstream services
    • On a new project, engage very early with the Systems/Ops team
    • Get something delivered end-to-end as early as possible: pave the way for more
    • App Dynamics
    • metrics-clojure
  • Tommy Hall: Escaping DSL Hell
    • If you're writing a language, be sure to actually design it
    • When you invent a DSL, people will want loops
    • You're better off embedding in a real language than inventing your own
    • Geomlab - DSL for learning, but dead-end as skills not directly transferrable
    • CLJSFiddle
    • Incanter - example of a great DSL embedded in clojure
    • Try doing SICP in ClojureScript
    • ClojureScript (and Clojure) need a much better day zero experience
  • Paul Ingles: multi-armed bandit optimisations
    • Its far easier to compare relative values, than to evaluate absolute values
    • Multi-armed bandit is about exploration vs exploitation
    • Thompson sampling models results so far into a probability distribution used to select the next value
  • Tommi Reiman: JSON APIs
  • Rich Hickey: Some core.async internals
    • Channels are just a conveyor belt
    • Use put! to throw stuff into a channel from the main thread
    • There are no unbounded buffers allowed. at all. ever.
    • Channels have a hard limit of 1024 queued items without buffers
  • Hallway Track

Day Two

  • David Nolen: Invention & Innovation
  • Phil Potter: test.check
  • Chris Ford: The Hitchhiker's Guide to the Curry-Howard Correspondence
  • Anna Pawlicka: Reactive data visualisations with Om
  • Malcolm Sparks: Assembling secure Clojure applications from re-usable parts
    • Liberator for building web APIs
    • bidi for declarative routing - as opposed to function composition, which cannot be reversed into URLs
    • Modular: an experiment in meta-architecture for more reusable components
    • There seems to be an inherent tension between dynamism and late binding, and composable modularity (which involves some form of encapsualtion)
  • Hallway Track


If nothing else, I think I've realised I need to be more consistent with my note taking - apologies to any speakers I didn't make notes about!

Data Provider Docstrings

For a tl;dr, skip to the final gist.

I'm a big believer in BDD-style testing. And by that I mean testing the behaviour of your code, and expressing your tests in those terms.

In PHPUnit, this tends to mean that instead of having a test outline like this:

  • EventMatcherTest
    • testConstructor
    • testSetEvents
    • testGetEvents
    • testMatchEvent
    • providerMatchEvent

You have something that looks more like this:

  • EventMatcherTest
    • test_matchEvent_does_exact_name_match
    • test_matchEvent_does_alias_name_match
    • test_matchEvent_does_shortname_match
    • test_matchEvent_falls_back_to_closest_match
    • test_matchEvent_ignores_different_dates
    • test_matchEvent_doesnt_match_wrong_way_around
    • test_matchEvent_doesnt_match_if_nothing_similar

However, this week I was editing a test someone else had written that used a dataProvider, I tend to stay away from these because it can often be a bit opaque as to what each data set is actually testing. However, I don't dislike them enough to merit completely re-writing someone else's reasonably good passing test.

I needed to add an additional case to this data provider, but it wasn't entirely clear which cases had already been covered. I went through adding a comment to each one, so they looked like this:

But then I realised that I could turn the comments into code, by setting the array keys!

The biggest benefit of this approach is that instead of getting test failure output like this:

1) EventMatcherTest::testMatchEventsSucceeds with data set #0 (array('Man Utd', 'Arsenal'), 'Man Utd vs Arsenal')

You get output like this:

1) EventMatcherTest::testMatchEventsSucceeds with data set "Exact match" (array('Man Utd', 'Arsenal'), 'Man Utd vs Arsenal')

These strings also appear in the TAP and junit formatter output, and can be used with the --filter switch.

Top tips for distributing work with queues

First off, lets get one thing straight: Message Queues are awesome.

They allow you to decouple the various parts of your application form each other, and communicate asynchronously. They protect you from data loss during restarts, and give you a an excellent visualisation of processing bottlenecks in your system.

It all starts by passing a message from A to B, via a queue Q.

You become enamoured with this approach and the flexibility it offers, running multiple instances of A and B and scaling to your heart's content.

A requirement for a new data-source pops up, and you think "aha!" - my queue shall save me, I'll just write a new process C, and have it publish messages to B via Q. This works wonderfully, and you sleep soundly at night.

Many months later you've run out of single-letter monikers for your applications, and many thousands of messages flow through your sturdy queues every second of the day. A sends to B via Q which sends to F via R which sends to G via S which sometimes sends back to A via T. You've got passive and active monitoring, graphing and alerting on the size and throughput of those lovely, lovely message queues.

And then one day a process logs an error: "Hey, this message is a bit dodgy - I can't handle this!". It includes enough debug information for you to see that yes: it is certainly a dodgy message, and failing to handle it is the correct course of action. But how on earth did it get there?

If you're any of the way along the journey above, these tips are for you.

Tip #1 - Identification

When a new message is assembled, give it a globally unique identifier.

You can store this in AMQP's message-id field.

Tip #2 - Birthday

When a new message is assembled, give it a timestamp.

You can store this in AMQP's timestamp field. Be as accurate as you can, but know that computers rarely completely agree on time - so don't overly worry about this.

Tip #3 - Source

When a new message is assembled, note which application is assembling it - and ideally why.

You could encode this into the message-id, or some combination of app-id, user-id and type.

Tip #4 - Ancestry

This is the really important one.

When creating a new message as a result of an existing message, include the data from tips 1, 2 and 3 in the new message.

Depending on the properties of the new message, you can use the same fields or use the headers field with names like parent-message-id or source-message-id - or even as part of the message body in some way.

Tip #5 - Tracking

Throughout the lifecycle of messages in your system, include the source-message-id in your log lines. This allows messages to be correlated to the event that spawned them.

Tip #6 - Lifetime

When an action is performed, log the time delta between the source-message-timestamp and the current time.

Network time means this isn't necessarily exact - but it a useful indicatation of the health and throughput of the system. We like to refer to this as "message lifetime".

Tip #7 - Factory

Create a small standalone module that exports only factory functions for creating messages in the format that will be used amongst your queues.

Give this module a version number, document it well, and then treat it as a third party dependency of any application that interacts with a message queue.

Why?

I've recently been attempting to debug issues in a system with a myriad of processes that interact via message queues. There are many desirable properties of this system - particularly around fault tolerance and horizontal scaling.

Many of the bugs I see are either

  1. Very hard to find the root cause of
  2. A result of very slightly inconsistent message contents

The tips above are intended to make it far easier to such resolve issues. The lifetime thing is just a nice-to-have.

Shared behaviours in PHPUnit with Traits

One of my favourite features from RSpec is Shared Behaviours - these allow you to include a standard set of tests against a bunch of different Classes.

If you consider an Interface declaration in PHP, it ensures that the implementing class matches the method signatures defined. However, an Interface also comes with some implicitly expected behaviours associated with these methods. We can document what this behaviour is supposed to be - but using a shared-behaviour-like approach we can assert it programmatically in our test suite.

This same logic can also be applied to Abstract classes and Traits. I'm personally not a fan of testing an abstract class by having a Mock object inherit from it, because this isn't how it's actually used in the real system. The fact that a particular class inherits from an Abstract is actually an implementation detail, the only part we care about is that it exhibits the behaviour required.

You can think of a Shared Behaviour as a runtime Interface declaration. An Interface statically states "This class implements this set of functionality", a Shared Behaviour proves "This class implements this set of functionality".

Example

The example we're using is some standard behaviour that we'll use across all of our collection classes. These collections receive their data as an associative array via a JSON API. The API also provides an array of IDs in the order that the collection should be iterated over - this is required because JSON doesn't guarantee the order of elements in its collection type. The collection classes handle converting to appropriate model classes, in addition to iterating in the correct order.

If this was an interface, it would look something like this:

The Plan

  • Write tests for a single concrete implementation of this interface
  • Implement a class which implements the interface and passes the tests
  • Move the standard behaviour specified in the test into a shared example group using a trait
  • Implement a second class which implements the interface by creating a default implementation using an abstract base class

Before abstraction...

We wrap up some of the instance creation boilerplate with some protected methods, but otherwise a pretty ordinary unit test.

If you like, you can read the implementation of this class, but there's not much to it.

Nothing particularly unusual so far, now we'll look at how we can reduce code and test duplication when we have more than one collection class.

Traits to the rescue!

To give you a taster, here's the two test classes after we've moved their shared behaviour into the trait. The second list class works a little differently, but still exposes the same interface. It also includes a higher-level iteration method specific to its own use-cases.

And now for the meat, the trait itself that makes this all work.

Note that this is almost identical to the single test version, but we've extracted the implementor-specific bits into method that the real test class can fill in later.

You can see the full code example, including the abstracted implementations of AppleList and AddressList in this gist.

In Summary

Hopefully the example I've chosen was suitable for conveying the idea. Rather than artificially testing an abstract class, we can structure tests to test the behaviour exhibited. The fact we're using an abstract class to share code becomes an implementation detail, rather than part of the interface contract. In my book, decoupling behaviour from implementation is always a good thing!

This approach should be applicable in any scenario where you have a number of classes exhibiting the same behaviour, regardless of whether they achieve this through inheritance, composition, or even copy-paste!

Thanks to Craig for running into the problem in the first place, as well as doing most of the implementation while I yelled suggestions over his shoulder.

Further Reading



Reload your Terminal with iTerm2 on OS X

After seeing a tweet from @GotNoSugarBaby

I thought "that seems doable", and it turns out it is!

As long as you're using iTerm2, you can bind hotkeys to "Send Text", which allows "\n" for newlines.

Simply open the preferences window (cmd+,) and switch to the "Keys" pane. Add a new global shortcut key as shown in the screenshot above (The original version of this post used "!!\n", but using the escape sequence for up as in the updated version also works in non-bash scenarios).

Now you can "Reload" your last command by bashing "cmd+r" - web-dev browser synergy here we come!

Standalone ChatZilla on OS X 10.8

I'm a big fan of ChatZilla, I've used it for years, and for a while I was a fairly active contributor.

These days I spend most of my time on a Mac, where the ChatZilla experience isn't as nice as it could be, due to the cmd+tab behaviour when running as an addon. When ChatZilla is a part of the brower, it's a pain to switch between it and other apps you may be using.

The following instructions are very rough, more of a brain-dump while this is fresh in my mind. I'll try and tidy them up at some point.

The solution, is to run ChatZilla as its own application, there are some instructions for doing this at http://chatzilla.rdmsoft.com/xulrunner/, but I couldn't get these to work for me. I appeared to be running into this bug https://bugzilla.mozilla.org/show_bug.cgi?id=699966

Undeterred, I figured there must be another option - Firefox since version 3.0 has provided an "-app" option, which can be pointed to an xulapp application.ini, and will launch it standalone. Happily, this works! But we're left with an application called "Firefox" and with the firefox icon.

The solution now is found on this superuser answer: http://superuser.com/questions/271678/how-do-i-pass-command-line-arguments-to... specifically, option 2.

I made a copy of Firefox.app, renamed it ChatZilla.app, edited the Info.plist, unpacked the .xulapp into the application's /Resources folder and created a little bash script that would execute ChatZilla.app/Contents/MacOS/firefox-bin with the appropriate -app switch. All references to firefox in the Info.plist need to be changed to ChatZilla, to ensure the OS doesn't get confused about which is the actual default browser.

The final step was then to use http://iconverticons.com/online/ to take the ChatZilla .ico and turn it into an icns file for my new application.

I think I now actually understand JavaScript objects

Over the last week or so, I've been writing a lot of NodeJS-based JavaScript, and in doing so I realised I didn't really have a picture of how the object model actually fits together. So I did some reading, and now I'm pretty sure I get it.

The articles I read are linked at the bottom of the post, but often code is clearer to read than paragraphs of explanation. I've thrown together some implementations of the key players in the javascript object model that mirror what the actual engine is doing under the hood.

The following examples should explain the prototype chain, how it's assembled and how it's interacted with - I've added some console output to the loops so we can see what's happening as they're called.

The code below assumes that its running under an ECMAScript 5 environment, so thats the latest versions of Safari, Chrome and Firefox - as well as NodeJS. It takes advantage of Object.create and Object.getPrototypeOf, these are the standardised versions of messing around with the __proto__ property.

And to see it in action:

Which produces:

> node demo.js 
john instance_of Employee
Trying to match Employee.prototype
Comparing to Retired.prototype
Comparing to Employee.prototype
true

john instance_of Array
Trying to match Array.prototype
Comparing to Retired.prototype
Comparing to Employee.prototype
Comparing to Person.prototype
Comparing to Object.prototype
false

new_instance(Person, 'john') instanceof Person
true

new_instance(Employee, 'boss', 'john') instanceof Employee
true

get_attribute(john, 'role')
Looking on object directly
Boss

get_attribute(john, 'greeting')
Looking on object directly
Looking on Retired.prototype
Looking on Employee.prototype
Welcome to work

call_method(john, 'greet')
Looking on object directly
Looking on Retired.prototype
Looking on Employee.prototype
Looking on Person.prototype
Hello John

If you'd like to have a play with these functions without firing up NodeJS, here's the obligatory JSFiddle: http://jsfiddle.net/88je8/1/

See also: