not a beautiful or unique snowflake (nothings) wrote,
not a beautiful or unique snowflake
nothings

Consider the following block of C code:

{
   int acc=0, i;
   for (i=100; i > 0; --i)
      acc += i;
}

On a naive, pure stack machine, this compiles to something like:

   push 0
   pop to acc
   push 100
   pop to i
  loop:
   push i
   push acc
   add
   pop to acc
   push i
   push 1
   sub
   pop to i
   push i
   pop & branch-if->0-to loop

It's not particularly common to try to just leave variables on the stack, so this sort of movement back and forth is necessary. It's possible for an "impure" stack machine to have things like 'add and pop to register' or 'add from register', but at that point you're clearly crossing closer to the pure register machine:

   move r2 <- 100
   move r1 <- 0
  loop:
   add    r1 <- r1, r2
   sub   r2 <- r2, 1
   jumpgz loop

"add" in a stack machine looks like this: tos = tos + *stack++. "add" in the register machine looks like this: reg[ip[1]] = reg[ip[2]] + reg[ip[3]] where 'ip' points to the instruction stream, which holds the register numbers. So clearly the register machine requires a lot more indirections and a lot more memory moves. Fortunately, they're all to cache.

But the cost of those memory moves is dwarfed by the cost of branch-prediction failure on every instruction. Every time we dispatch a new instruction, the real hardware instruction pointer jumps to a new "random" instruction, and a huge penalty is incurred. We reduce that cost by reducing the number of bytecodes executed for every unit of work we want to accomplish. The naive stack machine takes ten instructions; the register machine takes three. The naive stack machine can be reduced by making it less a naive stack machine, but it can't be reduced to register-machine speeds without making it register-like.

My current VM is executing about 40M instructions/sec (on a P3 700), including typechecking that things are ints, and decoding each register for whether it's a register or a constant (something I can avoid by precompiling the bytecode). A stack machine could be faster, but I doubt it could make up for the branch-prediction cost due to the 3:1 ratio of instructons.

If you don't understand the question, move on.</small

Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 4 comments