JavaScript Internals

JavaScript has become a prominent language widely used in web browsers. As such, it has also become important for Android benchmarking, and optimizing its performance remains a challenging problem for anyone involved in building mobile phones.

JavaScript is a dynamically-typed language. This causes a few complications as it has to keep track of the types of all variables as these can be modified at runtime. In addition, the number type in JavaScript is a double, which means that arithmetic operations are in general more expensive than they would be for integers. To get a clearer understanding of all the work required to interprehttp://criticalblue.com/news/wp-admin/post.php?post=257&action=edit&message=6#visibilityt a JavaScript program, let's take a look at what the widely used V8 JavaScript engine has to do to turn your JavaScript code into something that a processor understands.

The V8 Engine

V8 is a JavaScript engine written primarily by Google for their Chrome browser. V8 has no intermediate bytecode to execute; instead, it compiles all the JavaScript source into platform-specific machine code. The first pass is quick but not particularly optimized, but then the engine profiles the code and replaces it with a more optimized version in the most processor-intensive areas. To reduce startup time when a page is loaded, the V8 engine employs lazy compilation at runtime; note that it compiles entire functions, unlike Java that optimizes traces through the code.

A Simple Example

To illustrate the complications inherent in generating optimized code for JavaScript, we have taken a simple example of calculating an MD5 checksum. Part of that algorithm does an unsigned integer addition that is simulated here by doing a number addition and a bitwise add to perform the 32 bit overflow.

add32(a, b)
{
  return (a + b) & 0xFFFFFFFF;
}
JavaScript function

In a statically-typed language you would have to specify the types of the variables and the code would probably boil down to two loads, an addition, and a store. Let's look at what the optimized code generated by the V8 engine looks like.

kind = OPTIMIZED_FUNCTION
name = add32
stack_slots = 0
Instructions (size = 396)
0x33059560     0  e92d4902       stmdb sp!, {r1, r8, fp, lr}
0x33059564     4  e1a0c00c       mov ip, ip
0x33059568     8  e28db008       add fp, sp, #8
                  ;;; <@0,#0> -------------------- B0 --------------------
                  ;;; <@2,#1> context
0x3305956c    12  e1a00008       mov r0, r8
                  ;;; <@12,#9> -------------------- B1 --------------------
                  ;;; <@14,#11> stack-check
0x33059570    16  e59ac06c       ldr ip, [r10, #+108]
0x33059574    20  e15d000c       cmp sp, ip
0x33059578    24  2a000001       bcs +12 -> 36  (0x33059584)
0x3305957c    28  e59fc128       ldr ip, [pc, #+296]         ;; code: BUILTIN
0x33059580    32  e12fff3c       blx ip
                  ;;; <@15,#11> gap
0x33059584    36  e59b000c       ldr r0, [fp, #+12]
                  ;;; <@16,#21> tagged-to-i
0x33059588    40  e1b000c0       movs r0, r0, asr #1
0x3305958c    44  2a000009       bcs +44 -> 88  (0x330595b8)
                  ;;; <@17,#21> gap
0x33059590    48  e59b1008       ldr r1, [fp, #+8]
                  ;;; <@18,#22> tagged-to-i
0x33059594    52  e1b010c1       movs r1, r1, asr #1
0x33059598    56  2a000014       bcs +88 -> 144  (0x330595f0)
                  ;;; <@20,#13> add-i
0x3305959c    60  e0810000       add r0, r1, r0
                  ;;; <@22,#24> number-tag-i
0x330595a0    64  e0900000       adds r0, r0, r0
0x330595a4    68  6a00001f       bvs +132 -> 200  (0x33059628)
                  ;;; <@24,#19> return
0x330595a8    72  e1a0d00b       mov sp, fp
0x330595ac    76  e8bd4800       ldmia sp!, {fp, lr}
0x330595b0    80  e28dd00c       add sp, sp, #12
0x330595b4    84  e12fff1e       bx lr                       ;; debug: position 5285
                  ;;; <@16,#21> -------------------- Deferred tagged-to-i --------------------
0x330595b8    88  e0a01000       adc r1, r0, r0
0x330595bc    92  e5119001       ldr r9, [r1, #-1]
0x330595c0    96  e59ac03c       ldr ip, [r10, #+60]
0x330595c4   100  e159000c       cmp r9, ip
0x330595c8   104  1a000039       bne +236 -> 340  (0x330596b4)
0x330595cc   108  e241c001       sub ip, r1, #1
0x330595d0   112  ed9cbb01       vldr d11, [ip + 4*1]
0x330595d4   116  eebdfbcb       vcvt.s32.f64 s30, d11
0x330595d8   120  ee1f0a10       vmov r0, s30
0x330595dc   124  eeb8fbcf       vcvt.f64.s32 d15, s30
0x330595e0   128  eeb4bb4f       vcmp.f64 d11, d15
0x330595e4   132  eef1fa10       vmrs APSR, FPSCR
0x330595e8   136  1a000031       bne +204 -> 340  (0x330596b4)
0x330595ec   140  eaffffe7       b -92 -> 48  (0x33059590)
                  ;;; <@18,#22> -------------------- Deferred tagged-to-i --------------------
0x330595f0   144  e0a12001       adc r2, r1, r1
0x330595f4   148  e5129001       ldr r9, [r2, #-1]
0x330595f8   152  e59ac03c       ldr ip, [r10, #+60]
0x330595fc   156  e159000c       cmp r9, ip
0x33059600   160  1a00002d       bne +188 -> 348  (0x330596bc)
0x33059604   164  e242c001       sub ip, r2, #1
0x33059608   168  ed9cbb01       vldr d11, [ip + 4*1]
0x3305960c   172  eebdfbcb       vcvt.s32.f64 s30, d11
0x33059610   176  ee1f1a10       vmov r1, s30
0x33059614   180  eeb8fbcf       vcvt.f64.s32 d15, s30
0x33059618   184  eeb4bb4f       vcmp.f64 d11, d15
0x3305961c   188  eef1fa10       vmrs APSR, FPSCR
0x33059620   192  1a000025       bne +156 -> 348  (0x330596bc)
0x33059624   196  eaffffdc       b -136 -> 60  (0x3305959c)
                  ;;; <@22,#24> -------------------- Deferred number-tag-i --------------------
0x33059628   200  e24dd010       sub sp, sp, #16
0x3305962c   204  e92d0fff       stmdb sp!, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, fp}
0x33059630   208  e1a000c0       mov r0, r0, asr #1
0x33059634   212  e2200102       eor r0, r0, #-2147483648
0x33059638   216  ee0f0a10       vmov s30, r0
0x3305963c   220  eeb8fbcf       vcvt.f64.s32 d15, s30
0x33059640   224  e59a903c       ldr r9, [r10, #+60]
0x33059644   228  e300357c       movw r3, #1404
0x33059648   232  e34030e4       movt r3, #228
0x3305964c   236  e8931020       ldmia r3, {r5, ip}
0x33059650   240  e295400c       adds r4, r5, #12
0x33059654   244  2a000005       bcs +28 -> 272  (0x33059670)
0x33059658   248  e154000c       cmp r4, ip
0x3305965c   252  8a000003       bhi +20 -> 272  (0x33059670)
0x33059660   256  e5834000       str r4, [r3, #+0]
0x33059664   260  e5859000       str r9, [r5, #+0]
0x33059668   264  e1a00005       mov r0, r5
0x3305966c   268  ea000007       b +36 -> 304  (0x33059690)
0x33059670   272  e3a0c000       mov ip, #0
0x33059674   276  e58dc000       str ip, [sp, #+0]
0x33059678   280  e3a00000       mov r0, #0
0x3305967c   284  e3051718       movw r1, #22296
0x33059680   288  e3401016       movt r1, #22
0x33059684   292  e59fc024       ldr ip, [pc, #+36]          ;; code: STUB, CEntryStub, minor: 1
0x33059688   296  e12fff3c       blx ip
0x3305968c   300  e2400001       sub r0, r0, #1
0x33059690   304  ed80fb01       vstr d15, [r0 + 4*1]
0x33059694   308  e2800001       add r0, r0, #1
0x33059698   312  e58d0000       str r0, [sp, #+0]
0x3305969c   316  e8bd0fff       ldmia sp!, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, fp}
0x330596a0   320  e28dd010       add sp, sp, #16
0x330596a4   324  eaffffbf       b -252 -> 72  (0x330595a8)
                  [ Constant Pool
0x330596a8   328  e7f000f2       constant pool begin         ;; constant pool
0x330596ac   332  33024d40       constant
0x330596b0   336  3300a800       constant
                  ]
                  ;;; -------------------- Jump table --------------------
                  ;;; jump table entry 0: deoptimization bailout 1.
0x330596b4   340  e1a0e00f       mov lr, pc
0x330596b8   344  e59ff008       ldr pc, [pc, #+8]
                  ;;; jump table entry 1: deoptimization bailout 2.
0x330596bc   348  e1a0e00f       mov lr, pc
0x330596c0   352  e59ff004       ldr pc, [pc, #+4]
                  [ Constant Pool
0x330596c4   356  e7f000f2       constant pool begin         ;; constant pool
0x330596c8   360  2ee0a00c       constant
0x330596cc   364  2ee0a018       constant
                  ]
                  ;;; Safepoint table.

Deoptimization Input Data (deopt points = 3)
 index  ast id    argc     pc     commands
     0       3       0     36  BEGIN {frame count=1, js frame count=1}
                            JS_FRAME {ast_id=3, function=, height=0}
                            STACK_SLOT {input=-3}
                            STACK_SLOT {input=-2}
                            STACK_SLOT {input=-1}
     1       3       0     -1  BEGIN {frame count=1, js frame count=1}
                            JS_FRAME {ast_id=3, function=, height=0}
                            STACK_SLOT {input=-3}
                            STACK_SLOT {input=-2}
                            STACK_SLOT {input=-1}
     2       3       0     -1  BEGIN {frame count=1, js frame count=1}
                            JS_FRAME {ast_id=3, function=, height=0}
                            STACK_SLOT {input=-3}
                            STACK_SLOT {input=-2}
                            STACK_SLOT {input=-1}

Safepoints (size = 28)
0x33059584    36  11111111 (sp -> fp)       0
0x3305968c   300  10000000 | r0 | r8 (sp -> fp)  

RelocInfo (size = 179)
0x3305956c  comment  (;;; <@0,#0> -------------------- B0 --------------------)
0x3305956c  comment  (;;; <@2,#1> context)
0x33059570  comment  (;;; <@12,#9> -------------------- B1 --------------------)
0x33059570  comment  (;;; <@14,#11> stack-check)
0x3305957c  code target (BUILTIN)  (0x33024d40)
0x33059584  comment  (;;; <@15,#11> gap)
0x33059588  comment  (;;; <@16,#21> tagged-to-i)
0x33059590  comment  (;;; <@17,#21> gap)
0x33059594  comment  (;;; <@18,#22> tagged-to-i)
0x3305959c  comment  (;;; <@20,#13> add-i)
0x330595a0  comment  (;;; <@22,#24> number-tag-i)
0x330595a8  comment  (;;; <@24,#19> return)
0x330595b4  position  (5285)
0x330595b8  comment  (;;; <@16,#21> -------------------- Deferred tagged-to-i --------------------)
0x330595f0  comment  (;;; <@18,#22> -------------------- Deferred tagged-to-i --------------------)
0x33059628  comment  (;;; <@22,#24> -------------------- Deferred number-tag-i --------------------)
0x33059684  code target (STUB)  (0x3300a800)
0x330596a8  comment  ([ Constant Pool)
0x330596a8  constant pool
0x330596b4  comment  (])
0x330596b4  comment  (;;; -------------------- Jump table --------------------)
0x330596b4  comment  (;;; jump table entry 0: deoptimization bailout 1.)
0x330596bc  comment  (;;; jump table entry 1: deoptimization bailout 2.)
0x330596c4  comment  ([ Constant Pool)
0x330596c4  constant pool
0x330596d0  comment  (])
0x330596d0  comment  (;;; Safepoint table.)
V8 optimized assembly

That is a lot of instructions, but the most commonly used path is not actually that bad. As discussed above, the language has a number of features which means that it has to handle quite a lot of different exceptional cases, but let's break things down so that we can see what is happening in a bit more detail.

...
                  ;;; <@14,#11> stack-check
0x33059570    16  e59ac06c       ldr ip, [r10, #+108]
0x33059574    20  e15d000c       cmp sp, ip
0x33059578    24  2a000001       bcs +12 -> 36  (0x33059584)
0x3305957c    28  e59fc128       ldr ip, [pc, #+296]         ;; code: BUILTIN
0x33059580    32  e12fff3c       blx ip
...
Checking the stack pointer to control execution

The first real work that the function has to do is to check whether V8 wants to stop the execution. In order to do that it loads the process stack pointer limit to compare with the current stack pointer. V8 modifies this value when it wants to interrupt the program. This check is done whenever a function is called and also during loops. If the stack pointer has been modified, we branch out to handle the case and let V8 do some work, otherwise we skip over those instructions and continue.

...
                  ;;; <@15,#11> gap
0x33059584    36  e59b000c       ldr r0, [fp, #+12]
                  ;;; <@16,#21> tagged-to-i
0x33059588    40  e1b000c0       movs r0, r0, asr #1
0x3305958c    44  2a000009       bcs +44 -> 88  (0x330595b8)
                  ;;; <@17,#21> gap
0x33059590    48  e59b1008       ldr r1, [fp, #+8]
                  ;;; <@18,#22> tagged-to-i
0x33059594    52  e1b010c1       movs r1, r1, asr #1
0x33059598    56  2a000014       bcs +88 -> 144  (0x330595f0)
...
Loading the variables from the stack

The next step is to load our two variables a and b from the stack. At this point, fp contains an offset from the stack pointer, so we are just loading the two function parameters into r0 and r1. Then, something slightly odd happens: for each variable, we shift right by 1 and branch if the carry bit is set; this happens if the least significant bit is 1. This is done because V8 handles integers in a special way. The basic number type in JavaScript is a double, and it is considerably more expensive to do double arithmetic instead of integer arithmetic, so V8 tries to do integer maths if possible. It flags integers as having their least significant bit set to 0, and heap objects, i.e. everything else, have their least significant bit set to 1. In our case we have two integers, so we will skip over these tests and continue. If we don't have an integer, we need to do some further tests, as we will show. The use of the least significant bit for the integer / heap object flag means that the maximum unsigned integer we can handle in this way is 2^31.

...
                  ;;; <@20,#13> add-i
0x3305959c    60  e0810000       add r0, r1, r0
                  ;;; <@22,#24> number-tag-i
0x330595a0    64  e0900000       adds r0, r0, r0
0x330595a4    68  6a00001f       bvs +132 -> 200  (0x33059628)
...
Adding the variables together

The next step is simply to add the two values together. Once we have done that, we have to shift the value left by 1 to set the least significant bit to 0, which we do by adding the value to itself. At this point we need to check for an overflow and handle that case as well. If all is well, we can return from the function. The bulk of the rest of the function handles special cases.

...
                  ;;; <@16,#21> -------------------- Deferred tagged-to-i --------------------
0x330595b8    88  e0a01000       adc r1, r0, r0
0x330595bc    92  e5119001       ldr r9, [r1, #-1]
0x330595c0    96  e59ac03c       ldr ip, [r10, #+60]
0x330595c4   100  e159000c       cmp r9, ip
0x330595c8   104  1a000039       bne +236 -> 340  (0x330596b4)
0x330595cc   108  e241c001       sub ip, r1, #1
0x330595d0   112  ed9cbb01       vldr d11, [ip + 4*1]
0x330595d4   116  eebdfbcb       vcvt.s32.f64 s30, d11
0x330595d8   120  ee1f0a10       vmov r0, s30
0x330595dc   124  eeb8fbcf       vcvt.f64.s32 d15, s30
0x330595e0   128  eeb4bb4f       vcmp.f64 d11, d15
0x330595e4   132  eef1fa10       vmrs APSR, FPSCR
0x330595e8   136  1a000031       bne +204 -> 340  (0x330596b4)
0x330595ec   140  eaffffe7       b -92 -> 48  (0x33059590)
...
Dealing with the least significant bit

For example, this section deals with the case where the least significant bit was set and we have to load up the value from the heap. The first step is to take the value we shifted right, and restore it. We do this by adding it to itself, to shift it left with carry to add back the flag. Then we load up the value of that address into r9. Heap Objects start with a header, so we are not loading up the value of the object. In fact, we are checking if this is a HeapNumber object, and if it is not then we have to bail out to unoptimized code. Assuming that we are dealing with a number, we then have to check if the number is an integer. We do this by converting the float to an integer and back again. If the before-conversion and after-conversion numbers are the same, then we have an integer. If not, we again have to break out into a more general section of code.

As you can see, a large amount of code is generated. If we call the function with integers then almost all the time we will be taking the quickest path through it, but even so, we have to create quite a lot of instructions to handle the flexibility of the language. This makes it difficult to optimize JavaScript code, and explains why a lot of effort goes into trying to make JavaScript engines like v8 smarter. It is also an area rich in possibilities, if all we really want to do is add two integers together: this example has a lot of instructions which are not required to get that job done.

To write fast JavaScript it is desirable to take the path of least resistance through the code. Doing simple things like trying to work with integers where possible and restricting the amount of polymorphism in the code should reduce the amount of time that is spent in the more expensive special cases. If optimization of JavaScript is the goal, then this simple example should demonstrate some of these coding principles and why they are important.