No, not that goto, the other goto.

Sat Dec 26 22:00:00 2009

Digg! submit to reddit Delicious RSS Feed Readers

Another post inspired by a conversation on IRC:

02:28 <@mst> my key objection to 5.8.4 is that if you alter @_ and then goto it segfaults.
02:28 < dkulchenko> mst: you use goto?
02:29 <@apeiron> goto &sub is useful.
02:29 <@mst> dkulchenko: of course.
02:29 <@mst> dkulchenko: how else do you wrap import() routines and maintain the stack?
02:29 < dkulchenko> mst: i've been using perl for 6 years, and i have *no* idea what you just said :)
02:34 <@mst> you've seriously never seen goto &Foo::import; ?
02:34 < dkulchenko> nope
02:43 <@mst> it's not something you use very often
02:43 <@mst> but when you need it, you tend to need it

So it occurs to me that if a 6-year veteran doesn't know this trick, it's probably worth writing up.

First, however, I shall digress into traditional goto and why you didn't want to do that. Consider the following code:

  sub find_in_list {
    my ($pattern, @candidates) = @_;
    my $found;
    foreach my $cand (@candidates) {
      if ($cand =~ $pattern) {
        $found = $cand;
        goto DONE;
      }
    }
    DONE:
    return $found if $found;
    return;
  }

That's the traditional label-style goto, occasionally beloved of C programmers, hated by Djikstra, feared by maintainers everywhere (somewhere or other on the internet is a great rant by Linus on why it isn't always hateful in C, but it's Boxing Day so you can google it yourself).

Of course, in perl we already have the wonderful next, last and redo functions to handle such cases, so instead we can write:

  sub find_in_list {
    my ($pattern, @candidates) = @_;
    my $found;
    foreach my $cand (@candidates) {
      if ($cand =~ $pattern) {
        $found = $cand;
        last;
      }
    }
    return $found if $found;
    return;
  }

and when perl hits the 'last' it'll short circuit out of the foreach loop. Of course, while 'goto LABEL' is a bad thing, labels themselves aren't necessarily. If we have nested loops it can rapidly become confusing as to which loop you're going to affect, so we can attach a label to the loop like so:

  sub find_in_list {
    my ($pattern, @candidates) = @_;
    my $found;
    CANDIDATE: foreach my $cand (@candidates) {
      if ($cand =~ $pattern) {
        $found = $cand;
        last CANDIDATE;
      }
    }
    return $found if $found;
    return;
  }

Plus, provided you don't have a philosophical objection to having more than one exit point from a subroutine, you don't need any of this:

  sub find_in_list {
    my ($pattern, @candidates) = @_;
    foreach my $cand (@candidates) {
      return $cand if ($cand =~ $pattern);
    }
    return;
  }

And finally List::Util's first routine means we didn't need to write this subroutine at all:

  my $found = first { /$pattern/ } @candidates;

Ok. Digression over, general lack of the need for 'goto LABEL' established, let's get back to 'goto &sub'. What exactly does it do?

Well ... the difference between 'goto &sub' and 'sub(...)' is much like the difference between 'goto' and 'gosub' in earlier systems - calling a subroutine returns you to where you were afterwards. Making a goto doesn't.

More precisely, 'goto &sub' replaces the current subroutine call in perl space with the subroutine you goto'd to. And its place on the call stack goes away as well. Which means that if you have:

  sub call_me {
    return (do something with @_);
  }

  sub call_it_1 {
    call_me(@_);
  }

  sub call_it_2 {
    goto &call_me;
  }

then the result of invoking call_it_1 and call_it_2 will be identical. Except. Inside the call_me subroutine, with call_it_1, call_me can see that it was called by call_it_1. But if you used call_it_2 then call_me would think it had been called by whatever called call_it_2, not call_it_2 itself.

So, why does this matter? Well ... there are two reasons I've encountered before now. They both boil down to "because we want to pretend we were never there" - the question is whether we're lying to the user or to the code that we goto to.

Why lie to the user? Well, because sometimes the user doesn't want to know our code existed. For example, a lot of perl code uses Carp's carp() and croak() calls in order to show an error from where the mistaken call was - so for example if our code does:

  routine_that_calls_carp('bad','args');

then we'll get an error from that line in our code. But, if we override or wrap that routine somehow - for example for debugging purposes:

  my $orig = Their->can('routine_that_calls_carp');
  local *Their::routine_that_calls_carp = sub {
    warn "Called routine_that_calls_carp with ".join(', ', @_);
    $orig->(@_);
  };

then the error is going to reported from the last line of our override, which while correct is not particularly useful to the user. However, with the aid of 'goto &sub' this is easily fixed:

  my $orig = Their->can('routine_that_calls_carp');
  local *Their::routine_that_calls_carp = sub {
    warn "Called routine_that_calls_carp with ".join(', ', @_);
    goto &$orig;
  };

With this version, since our shim routine has been removed from the call stack, the routine_that_calls_carp code will think it was still called directly by the original user code, and so the error will be reported correctly.

Now, what about lying to other code? Well, again, this is for a case where we need to make it look like the call didn't come from where it actually came from. The usual reason for this is to do a simple wrap of an import() method - because import methods often use the call stack to work out who called them and therefore where they should export to. So:

  package My::Module;

  use Their::Module;

is equivalent to:

  package My::Module;

  BEGIN {
    require Their::Module;
    Their::Module->import;
  }

Now, say Their::Module sets up one(), two() and three() functions in My::Module - but we want to have that, and to export our own four(). We could write:

  package My::Wrapper::For::Their::Module;

  sub import {
    my $targ = caller;
    { no strict 'refs'; *{"${targ}::four"} = \&four; }
    goto &Their::Module::import;
  }

and Their::Module would think it had been called by My::Module directly, and therefore export to the correct place.

Of course, there are usually better ways of doing this. For Exporter based modules, we can call export_to_level - for example the Data::Dumper::Concise::Sugar shortcut Devel::Dwarn does:

  sub import {
    Data::Dumper::Concise::Sugar->export_to_level(1, @_);
  }

in order to make the following two equivalent:

  use Data::Dumper::Concise::Sugar;
  use Devel::Dwarn;

and if the module in question is using Sub::Exporter, we can similarly pass 'into' or 'into_level' arguments to the other module's import - i.e.

  Sub::Exporter::Using::Thing->import({ into => $targ }, @_);

  # or

  Sub::Exporter::Using::Thing->import({ into_level => $targ }, @_);

would work to wrap such an import routine.

So ... even in this case, we usually don't want or need 'goto &sub' but there are plenty of modules with exporters that don't correctly use either Exporter or Sub::Exporter, and for simple cases where you're doing things on the fly for debugging it's often quicker to use the goto variant.

In summary: 'goto &sub' isn't evil, isn't completely useless, and while definitely not something you'd use on a daily basis, is an important capability of perl when you do need the functionality.

Oh, and Merry Christmas.

-- mst, out.