Telephone +44(0)1524 64544
Email: info@shadowcat.co.uk

Choosing a library

Library selection process by level of importance

Fri Dec 18 17:10:00 2009

Digg! submit to reddit Delicious RSS Feed Readers

Ok, so, you've realised you want to do X. And you don't want to reinvent the wheel, so it's time to try and go find a library.

First stop, of course, is search.cpan.org where you'll hopefully find half a dozen candidates fairly quickly. Second stop for me is to ask a few relevant IRC channels - your preferred option may be twitter, or mailing lists, or whatever, but you should ask the practitioners you appreciate the opinions of and see if they can recommend something.

Before we go on, it's time for a value judgement - just how important is this library going to be to your codebase? Something that will get used once, in one module? This is reasonably common for special purpose utility modules, and indicates a "so long as it works" level of interest in the library. I'll refer to this as a 'once' library.

The next level up is a library that's going to be used for a purpose you'll regularly need - something that's part of your standard toolbox for this project. I'll refer to this as a 'toolbox' library.

Finally, above that, we have the code that will define a large proportion of your application - this doesn't necessarily mean a framework, but any large library - so if you're writing a database backed application, your ORM would likely be in this category. If you're writing a mail client, your choice from the Mail:: and Email:: namespaces on CPAN would be. I'll refer to this as a 'platform' library.

Terminology defined and candidates selected, what are we going to do?

First stage is to skim-read the main documentation for the library - ask yourself "does this look like it'll do what I need it to do?" If the answer is obviously "yes" and this is a 'once' class library, then at this point I'd cpan it and try it out - if I'm only using the thing for a couple purposes all I really care is if it works for those purposes. Caring more can come later if I end up with failing tests in context.

However, for a 'toolbox' library we need to be more sure than that. We want to know it not only works for the things we know we need to do but for for the things we probably need to do in the same space later in the project. So now it's time to do a bit more thinking.

Of the candidates, some are likely procedural, some OO. Some are more convenient, some are more flexible. We need to think about what level of usage we're going to need - if what we're doing is incidental or straightforward then our ideal library looks a lot different than if we're attempting to do fairly complex things in this domain (though again, if it starts to move towards the core purpose of the application you might want to reclassify it to 'platform').

Once we've narrowed it to a couple of candidates, our next step is going to be to crack open the test suites for the remaining choices. We want to look for examples of the sort of usage we have in mind to get a feel for how the author uses the code themselves. Of course, the test suite is a pretty warped lens through which to learn this, but it's usually the best thing available. If the tests 'feel' right, and the documentation indicates that not only does this piece of software do what we expect of it now but seems to have the right feature set to do what we'll need of it in the near future, then for a 'toolbox' library we can stop here and get prototyping.

For a 'platform' class library, however, I'd claim this is still insufficient. Something that important, that central, to a project, is not only going to need to fit you like a glove as you code, it's going to need to bend when your vision of the right way and the designer's vision differ.

Which is, sadly, why so many people waste immense amounts of time reinventing perfectly good frameworks - because the visions differed and their answer was to write something new that fit their vision. And sometimes, that is, admittedly, the best idea. But almost never.

The better thing to do, generally, is to crack open the source code of this essential thing you're about to depend on and start reading. Get a feel for the shape of it. Get a feel for how it ticks, how it fits together, what the ... taste ... of it is like. You're skimming here. Don't try and understand the detail of the implementation, try and understand the ebb and flow of logic and responsibility on a grand scale.

Then, ask yourself: if I needed to do weird thing X, could I? And then start reading in earnest. You should reach one of three conclusions: (1) "if I pass this combination of options it should ... just work", (2) "if I override this method here it should work", (3) "if I replace this section of the code it will still work, just". If you can't at least get to (3) then you probably don't want to take this library on. (1) is ideal but be aware that you often won't spot that case until you've used the library for a while. DBIx::Class users of six months often go "hm, if I pass X, Y and Z to the resultset ... damn, it worked!" - but somebody with a month's experience would probably flail around and do something inefficient (the trouble with platform level libraries is that it's heinously difficult to make them accessible). I usually expect case 2, and am happy if I find it.

Seeing case (3)s in some cases is actually really important as well though - because finding a solution that involves the wholesale replacement of part of the library demonstrates that you can wholesale replace part of it. Catalyst is especially good for this - I was once bored of an evening, ripped out the engine code and substituted something completely different - and ended up with an IRC bot ...

Assuming your thought experiment through a few weird things you might like to do comes up with most of them being possible, it's a reasonably good bet that you can make this work - or at least make it work for long enough that you gain a significant advantage over scratch building from the word go. If you decouple your code reasonably carefully, while a 'platform' level library is much, much harder to swap out than almost anything else in your project, it should still be very much possible.

Two final notes.

First, the same library may be at a different level depending on the project. For example, when I first encountered Moose I audited it as 'platform' level because I already knew that I wanted it for the metaprotocol, and would be fiddling, tweaking, subclassing and re-engineering it all over the place to make maximum advantage of the available metapower. For an average project where Moose is simply going to be "the class builder", you almost certainly only need to audit it at 'toolbox' level - and the documentation's sufficiently good thanks to Dave Rolsky's manual work that you probably don't need the tests to do so, either.

Second, coming to understand that it's better to bend an existing platform level library than to write your own is usually a lesson learned the hard way. I wrote my own templating engine a long time ago, and then discovered Template Toolkit and kicked myself (although for my purposes that was arguably more 'toolbox' than 'platform'). I wrote my own web framework, twice, and then discovered Catalyst, spend a very happy twelve hours reading the source, and happily marked the in-house stuff "maintainance only" and moved on.

I remember thinking to myself at the time that I'd committed two of the three usual cardinal sins of the aspiring perl programmer ... but while I'd wasted my time creating a templating system and a web framework, at least I'd avoided the third cardinal sin by not writing an ORM ...

-- mst, out.