Debugging through arbitrary code injection

This was written in response to a question I posed on stack overflow. About 15 months have gone by and now I'll attempt to explain it, although to be honest, it won't be easy.

What I wanted to do was have a coroutine or yield like system in JS. What I ended up doing was creating a really compact debugger.

This chunk of code (and a little bit of a wrapper) can do the following:

It can be toggled to have a 0 performance cost and live inside your app for perpetuity.

Oh right, and minified it would be a handsome candidate for JS1k. (But I only want to see nifty canvas demos there too).

The Miraculous world of Javascript

Before we move on, here is a hypertext link to the repo, which has a verbose example: https://github.com/kristopolous/_inject

The key to understanding it will be done by using it first, then we can go out from there. I put up an example site.

Function aliasing

You may have noticed that the sample page gives almost the same code as the source, except for the fact that there are all these huge numbers in it. Those have replaced the "RAND" signatures.

There's an underutilized feature of JS that is tucked away in the code. Without getting too technical, when you do A + B and A is a string, then B will be converted to a string for us. (The constructor's toString() function will be called; which can be user-defined)

So if we take the block from the source and just look at a smaller portion we'd get this:

return (
   ...
  'self._inject["' + scope + '"] = (' + (function() {
   ...
  )
  .replace(/RAND/g, '__INJECT__' 
  + Math.random().toString().substr(2)
);

Using the global browser-based self, along with our reserved function _inject, being used as an array, we can take the scope, defined by how the user calls the function and then set it equal to a giant string. The Math.random().toString.substr(2) says take a random number, which by default will be between 0 and 1, and then remove the first two characters, i.e. the period, which has a special meaning in function scoping of course.

This method gives us a large string of valid numbers, prefixed with __INJECT__ to avoid possible naming collisions.

Function arity and arguments

This is more commonly known than it used to be. There's a variable arguments that has two interesting properties:

  1. It refers to the arguments passed in at the current scope, and is like an array (but isn't).
  2. It's always locally scoped. (try this in node: (function(){arguments=0;})();arguments)

It's always locally scoped (still don't believe me? Try this: (function(){ arguments=123; (function(){console.log(arguments);})(); })()).

We can just define it if we don't have it without a var declaration; it is indeed that special. We can also treat it like an array, more on that later:

return (
  'try { arguments } catch(e) { arguments = [] } ' +
  'self._inject["' + scope + '"] = (' + (function() {
   ...
  )
  .replace(/RAND/g, '__INJECT__' 
  + Math.random().toString().substr(2)
);

Preserving the context of the caller

In any scope you have the explicit variables and declarations at that scope and two implicit things; the this variable and the recently discussed arguments. In order to inject something that has all the features in the outline, we have to preserve both of these things, then somehow repurpose them magically. This leads to the next block.

We can take the arguments and then call Array's slice() function on it. This works because slice is written generically and only requires indexes to exist and a length. Here, try this: a = {0: 0, 1:1, 2:2, length:3}; Array.prototype.slice.call(a);.

Because our arguments object has both indexes and a length, we can essentially typecast it an array using slice. Crafty, crafty...

return (
  'try { arguments } catch(e) { arguments = [] } ' +
  'self._inject["' + scope + '"] = (' + (function() {
    var RAND = { 
      that: arguments[0],
      arg: Array.prototype.slice.call(arguments[1]) 
    };
   ...
  )
  .replace(/RAND/g, '__INJECT__' 
  + Math.random().toString().substr(2)
);

Re-establishing the context

When doing post-mortem analysis, we want the arguments to be the arguments and the this to be the this. This is possible via Function.apply as long as we wrap things well enough.

return (
  'try { arguments } catch(e) { arguments = [] } ' +
  'self._inject["' + scope + '"] = (' + (function() {
    var RAND = { 
      that: arguments[0],
      arg: Array.prototype.slice.call(arguments[1]) 
    };

    function RAND_callback() {
      ...
      return (function() {
        ...
      }).apply(RAND.that, RAND.arg);
    }
  )
  .replace(/RAND/g, '__INJECT__' 
  + Math.random().toString().substr(2)
);

Executing Arbitrary Code

Javascript has an eval, as do most interpreted languages. Doug Crockford sees using eval as always bad practice, but his words aren't biblical. In this case it's a great use of eval, since we are doing essentially just that.

Remember what we said above how when you do A [String] + B [*], B tries to be converted to a String? Well this makes things easy. It means you can pass in a function and as long as we do a construction like that in our eval, we can execute arbitrary blocks.

If that arbitrary block is itself a function, then we can re-apply our context to that, or just return the results of that eval. This allows for what could be termed "onion-style" functional programming: functions inside of functions inside of functions insi...

return (
  'try { arguments } catch(e) { arguments = [] } ' +
  'self._inject["' + scope + '"] = (' + (function() {
    var RAND = { 
      that: arguments[0],
      arg: Array.prototype.slice.call(arguments[1]) 
    };

    function RAND_callback() {
      RAND.block = arguments[0];

      return (function() {
        RAND.result = eval('(' + RAND.block + ')');

        return 'function' === typeof RAND.result ? 
          RAND.result.apply(RAND.that, RAND.arg) : 
          RAND.result;

      }).apply(RAND.that, RAND.arg);
    }
  )
  .replace(/RAND/g, '__INJECT__' 
  + Math.random().toString().substr(2)
);

Code Injection and Function Tracing

Although we aren't finished, we already have the first two implemented. We can inject arbitrary code that will be eval'd at the point we inject it, thus establishing our code to have an arbitrary context, executed at an arbitrary time.

This allows us to do function accounting or tracing at leasure since our arbitrary function can simply be that.

Breakpoints and more

Try this: function debugger(){} You will probably see something kind of odd. That's because of another, underutilized functionality: An ability to call a debugger. It's similar to back in the C days you could do interrupt 0x03 in order to strike up the debugger. This is a break-point; it stops execution in its tracks if you have a debugger installed.

You can use that with the library for conditional breaks, something that none of the browser debuggers have gotten around to implementing yet.

You can also throw new Error() to get stack traces or use XHR to remotely log issues as they come up. If you have XDOM issues, there's a good trick of using new Image() and then setting the .src to have data you'd like to send back to the server. Such as DebuggerDomain.com/image.php?debug1=key&debug2=value.

On top of that you can run accumulators and create matching/filtering subsets that are inspectable and trackable. It really opens up a huge world of possibilities.