Telephone +44(0)1524 64544
Email: info(at)shadowcat.co.uk

Fri May 1 20:40:00 2015

War Stories: A tower of bool

For reasons lost in the mists of time, the MongoDB module's BSON implementation would only serialise boolean.pm objects as booleans - which ... is fine in theory, that module was designed to be a standard boolean representation after all. Happily, as of 1.0 this was fixed so the following story covers only the historical problem.

Or: MongoDB fixed their shit. This is just a war story.

Unfortunately, that's not what any of the JSON modules produce, which is All Sorts Of Fun (tm) when you're trying to load JSON data into it (MongoDB have, indeed, noticed this, and are apparently going to try untangling it later this year ... but that will be then and this is now).

Meanwhile, the easy-ish way of dealing with this is to write some sort of filter function, roughly -

use boolean;

sub reboolify {
  my ($in) = @_;
  my $ref = ref($in);
  if (!$ref) {
    return $in;
  } elsif ($ref eq 'HASH') {
    return { map +($_ => reboolify($in->{$_}), keys %$in };
  } elsif ($ref eq 'ARRAY') {
    return [ map reboolify($_), @$in ];
  } elsif ($ref =~ /::Boolean$/) {
    return $$in ? true : false;
  }
  return $in;
}

You could use a more accurate test for the boolean class but assuming you're writing basically

my $mongofied = reboolify(decode_json($text));

then it doesn't make a lot of difference. Of course, this still requires traversing the entire data structure, which can make a difference for bulk imports, and requires going through and modifying all the calls to decode_json which if you managed to get a decent distance before noticing this problem isn't necessarily any fun.

This is the point at which somebody pointed out to me that

use JSON::PP qw(decode_json);
use boolean;

$JSON::PP::true = true;
$JSON::PP::false = false;

my $mongofied = decode_json($text);

work fine - but takes a ridiculous amount of time because JSON::PP isn't the fastest thing in existence. The thing is, nothing else has a published API for doing the same thing.

Which led me to wonder ... does anything else have an unpublished way of doing it anyway?

It turns out that both JSON::Tiny and Mojo::JSON, which are significantly faster than JSON::PP, do provide globals, so

use JSON::Tiny qw(decode_json);
use boolean;

$JSON::Tiny::TRUE = true;
$JSON::Tiny::FALSE = false;

my $mongofied = decode_json($text);

also does the trick.

JSON::XS ... doesn't. It uses constant subroutines set up at load time, so there's very little you can do.

Cpanel::JSON::XS on the other hand, does have $true and $false globals. Sadly, attempting to simply do

$Cpanel::JSON::XS::true = true;

results in a "Modification of a read-only value attempted" error. Which led me to look at the source ... which didn't appear to read-only the values at all. However, those values are setup before the XS parts of the module are loaded, and if we look there, we find:

INLINE SV *
get_bool (pTHX_ const char *name)
{
  SV *sv = get_sv (name, 1);

  SvREADONLY_on (sv);
  SvREADONLY_on (SvRV(sv));

  return sv;
}

which is called during initialisation, so by the time require() returns, they're already readonly.

Pants.

Of course, where there's a will, there's a way - though nobody said it was necessarily going to be a good way.

Going back to the perl sources, I noticed that the true and false values are set up by the following code -

our ($true, $false);
if ($INC{'JSON/XS.pm'} and $JSON::XS::VERSION ge "3.00") {
  $true  = $Types::Serialiser::true; # readonly if loaded by JSON::XS
  $false = $Types::Serialiser::false;
} else {
  $true  = do { bless \(my $dummy = 1), "JSON::XS::Boolean" };
  $false = do { bless \(my $dummy = 0), "JSON::XS::Boolean" };
}

and Types::Serialiser, generic though its name is, is basically only used by JSON::XS - hence why Cpanel::JSON::XS only loads it when it expects to be interoperating with JSON::XS.

Now, in this case, I already knew that JSON::XS wasn't going to be loaded. But, of course the check will still fire, so you can force the code to go down that path using -

BEGIN {
  local $INC{'JSON/XS.pm'} = 'RAAAAAGE';
  local $JSON::XS::VERSION = '3.00';
  require Cpanel::JSON::XS;
}

at which point all we need to do is actually slip the values we want in behind its back -

BEGIN {
  use boolean;
  local $INC{'JSON/XS.pm'} = 'RAAAAAGE';
  local $JSON::XS::VERSION = '3.00';
  local $Types::Serialiser::true = true;
  local $Types::Serialiser::false = false;
  require Cpanel::JSON::XS;
}

and subsequently

use Cpanel::JSON::XS qw(decode_json);

my $mongofied = decode_json($text);

just works.

Runtime with the JSON::PP solution - 29 minutes.

Runtime with the Cpanel::JSON::XS horrific hack - 100 ms.

Recommendation if you run into this in the wild? Use a solution like the reboolify subroutine I started with unless you really really need that last erg of performance.

It's always nice to have options, though.

-- mst, out.