A Dive Into JavaScriptCore

In medias res
Backstory
Diving in
Conclusion

Recently, the compiler team at Igalia was discussing the available resources for the WebKit project, both for the purpose of onboarding new Igalians and for lowering the bar for third-party contributors. As compiler people, we are mainly concerned with JavaScriptCore (JSC), WebKit’s javascript engine implementation. There are many high quality blog posts on the webkit blog that describe various phases in the evolution of JSC, but finding one’s bearings in the actual source can be a daunting task.

The aim of this post is twofold: first, document some aspects of JavaScriptCore at the source level; second, show how one can figure out what a piece of code actually does in a large and complex source base (which JSC’s certainly is).

In medias res

As an exercise, we’re going to arbitrarily use a commit I had open in a web browser tab. Specifically, we will be looking at this snippet:

Operands<Optional<JSValue>> mustHandleValues(codeBlock->numParameters(), numVarsWithValues);
int localsUsedForCalleeSaves = static_cast<int>(CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters());
for (size_t i = 0; i < mustHandleValues.size(); ++i) {
    int operand = mustHandleValues.operandForIndex(i);
    if (operandIsLocal(operand) && VirtualRegister(operand).toLocal() < localsUsedForCalleeSaves)
	continue;
    mustHandleValues[i] = callFrame->uncheckedR(operand).jsValue();
}

This seems like a good starting point for taking a dive into the low-level details of JSC internals. Virtual registers look like a concept that’s good to know about. And what are those “locals used for callee saves” anyway? How do locals differ from vars? What are “vars with values”? Let’s find out!

Backstory

Recall that JSC is a multi-tiered execution engine. Most Javascript code is only executed once; compiling takes longer than simply interpreting the code, so Javascript code is always interpreted the first time through. If it turns out that a piece of code is executed frequently though¹, compiling it becomes a more attractive proposition.

Initially, the tier up happens to the baseline JIT, a simple and fast non-optimizing compiler that produces native code for a Javascript function. If the code continues to see much use, it will be recompiled with DFG, an optimizing compiler that is geared towards low compilation times and decent performance of the produced native code. Eventually, the code might end up being compiled with the FTL backend too, but the upper tiers won’t be making an appearence in our story here.

What do tier up and tier down mean? In short, tier up is when code execution switches to a more optimized version, whereas tier down is the reverse operation. So the code might tier up from the interpreter to the baseline JIT, but later tier down (under conditions we’ll briefly touch on later) back to the baseline JIT. You can read a more extensive overview here.

Diving in

With this context now in place, we can revisit the snippet above. The code is part of operationOptimize. Just looking at the two sites it’s referenced in, we can see that it’s only ever used if the DFG_JIT option is enabled. This is where the baseline JIT ➞ DFG tier up happens!

The sites that make use of operationOptimize both run during the generation of native code by the baseline JIT. The first one runs in response to the op_enter bytecode opcode, i.e. the opcode that marks entry to the function. The second one runs when encountering an op_loop_hint opcode (an opcode that only appears at the beginning of a basic block marking the entry to a loop). Those are the two kinds of program points at which execution might tier up to the DFG.

Notice that calls to operationOptimize only occur during execution of the native code produced by the baseline JIT. In fact, if you look at the emitted code surrounding the call to operationOptimize for the function entry case, you’ll see that the call is conditional and only happens if the function has been executed enough times that it’s worth making a C++ call to consider it for optimization.

The function accepts two arguments: a vmPointer which is, umm, a pointer to a VM structure (i.e. the “state of the world” as far as this function is concerned) and the bytecodeIndex. Remember that the bytecode is the intermediate representation (IR) that all higher tiers start compiling from. In operationOptimize, the bytecodeIndex is used for

distinguishing between function and loop entry points
the DFG to be able to do program analysis of the values at the respective program point
various diagnostics.

Again, the bytecodeIndex is a parameter that has already been set in stone during generation of the native code by the baseline JIT.

The other parameter, the VM, is used in a number of things. The part that’s relevant to the snippet we started out to understand is that the VM is (sometimes) used to give us access to the current CallFrame. CallFrame inherits from Register, which is a thin wrapper around a (maximally) 64-bit value.

The `CodeBlock`

In this case, the various accessors defined by CallFrame effectively treat the (pointer) value that CallFrame consists of as a pointer to an array of Register values. Specifically, a set of constant expressions

struct CallFrameSlot {
    static constexpr int codeBlock = CallerFrameAndPC::sizeInRegisters;
    static constexpr int callee = codeBlock + 1;
    static constexpr int argumentCount = callee + 1;
    static constexpr int thisArgument = argumentCount + 1;
    static constexpr int firstArgument = thisArgument + 1;
};

give the offset (relative to the callframe) of the pointer to the codeblock, the callee, the argument count and the this pointer. Note that the first CallFrameSlot is the CallerFrameAndPC, i.e. a pointer to the CallFrame of the caller and the returnPC.

The CodeBlock is definitely something we’ll need to understand better, as it appears in our motivational code snippet. However, it’s a large class that is intertwined with a number of other interesting code paths. For the purposes of this discussion, we need to know that it

is associated with a code block (i.e. a function, eval, program or module code block)
holds data relevant to tier up/down decisions and operations for the associated code block

We’ll focus on three of its data members:

int m_numCalleeLocals;
int m_numVars;
int m_numParameters;

So, it seems that a CodeBlock can have at least some parameters (makes sense, right?) but also has both variables and callee locals.

First things first: what’s the difference between callee locals and vars? Well, it turns out that m_numCalleeLocals is only incremented in BytecodeGeneratorBase<Traits>::newRegister whereas m_numVars is only incremented in BytecodeGeneratorBase<Traits>::addVar(). Except, addVar calls into newRegister, so vars are a subset of callee locals (and therefore m_numVars ≤ m_numCalleelocals).

Somewhat surprisingly, newRegister is only called in 3 places:

So there you have it. Callee locals

are allocated by a function called newRegister
are either a var or a temporary.

Let’s start with the second point. What is a var? Well, let’s look at where vars are created (via addVar):

There is definitely a var for every lexical variable (VarKind::Stack), i.e. a non-local variable accessible from the current scope. Vars are also generated (via BytecodeGenerator::createVariable) for

So, intuitively, vars are allocated more or less for “every JS construct that could be called a variable”. Conversely, temporaries are storage locations that have been allocated as part of bytecode generation (i.e. there is no corresponding storage location in the JS source). They can store intermediate calculation results and what not.

Coming back to the first point regarding callee locals, how come they’re allocated by a function called newRegister? Why, because JSC’s bytecode operates on a register VM! The RegisterID returned by newRegister wraps the VirtualRegister that our register VM is all about.

Virtual registers, locals and arguments, oh my!

A virtual register (of type VirtualRegister) consists simply of an int (which is also called its offset). Each virtual register corresponds to one of

local (i.e. variable or temporary)
argument
reference to a constant in the constant pool
reference to a field in the header of a CallFrame (caller frame, return address, argument count, callee, code block)

There is no differentiation between locals and arguments at the type level (everything is a (positive) int); However, virtual registers that map to locals are negative and those that map to arguments are nonnegative. In the context of bytecode generation, the int

for a local indexes into m_calleeLocals
for an argument indexes into m_parameters
for a constant, it indexes into m_constantRegisters

It feels like JSC is underusing C++ here.

In all cases, what we get after indexing with a local, argument or constant is a RegisterID. As explained, the RegisterID wraps a VirtualRegister. Why do we need this indirection?

Well, there are two extra bits of info in the RegisterID. The m_refcount and an m_isTemporary flag. The reference count is always greater than zero for a variable, but the rules under which a RegisterID is ref’d and unref’d are too complicated to go into here.

When you have an argument, you get the VirtualRegister for it by directly adding it to CallFrame::thisArgumentoffset.

When you have a local, you map it to (-1 - local) to get the corresponding Virtualregister. So

local	vreg
0	-1
1	-2
2	-3

(remember, virtual registers that correspond to locals are negative).

For an argument, you map it to (arg + CallFrame::thisArgumentOffset()):

argument	vreg
0	this
1	this + 1
2	this + 2

Which makes all the sense in the world when you remember what the CallFrameSlot looks like. So argument 0 is always the `this` pointer.

If the vreg is greater than some large offset (s_firstConstantRegisterIndex), then it is an index into the CodeBlock's constant pool (after subtracting the offset).

Bytecode operands

If you’ve followed any of the links to the functions doing the actual mapping of locals and arguments to a virtual register, you may have noticed that the functions are called localToOperand and argumentToOperand. Yet they’re only ever used in virtualRegisterForLocal and virtualRegisterForArgument respectively. This raises the obvious question: what are those virtual registers operands of?

Well, of the bytecode instructions in our register VM of course. Instead of recreating the pictures, I’ll simply encourage you to take a look at a recent blog post describing it at a high level.

How do we know that’s what “operand” refers to? Well, let’s look at a use of virtualRegisterForLocal in the bytecode generator. BytecodeGenerator::createVariable will allocate² the next available local index (using the size of m_calleeLocals to keep track of it). This calls into virtualRegisterForLocal, which maps the local to a virtual register by calling localToOperand.

The newly allocated local is inserted into the function symbol table, along with its offset (i.e. the ID of the virtual register).

The SymbolTableEntry is looked up when we generate bytecode for a variable reference. A variable reference is represented by a ResolveNode³.

So looking into ResolveNode::emitBytecode, we dive into BytecodeGenerator::variable and there’s our symbolTable->get() call. And then the symbolTableEntry is passed to BytecodeGenerator::variableForLocalEntry which uses entry.varOffset() to initialize the returned Variable with offset. It also uses registerFor to retrieve the RegisterID from m_calleeLocals.

ResolveNode::emitBytecode will then pass the local RegisterID to move which calls into emitMove, which just calls OpMov::emit (a function generated by the JavaScriptCore/generator code). Note that the compiler implicitly converts the RegisterID arguments to VirtualRegister type at this step. Eventually, we end up in the (generated) function

template<OpcodeSize __size, bool recordOpcode, typename BytecodeGenerator>
static bool emitImpl(BytecodeGenerator* gen, VirtualRegister dst, VirtualRegister src)
{
    if (__size == OpcodeSize::Wide16)
	gen->alignWideOpcode16();
    else if (__size == OpcodeSize::Wide32)
	gen->alignWideOpcode32();
    if (checkImpl<__size>(gen, dst, src)) {
	if (recordOpcode)
	    gen->recordOpcode(opcodeID);
	if (__size == OpcodeSize::Wide16)
	    gen->write(Fits<OpcodeID, OpcodeSize::Narrow>::convert(op_wide16));
	else if (__size == OpcodeSize::Wide32)
	    gen->write(Fits<OpcodeID, OpcodeSize::Narrow>::convert(op_wide32));
	gen->write(Fits<OpcodeID, __size>::convert(opcodeID));
	gen->write(Fits<VirtualRegister, __size>::convert(dst));
	gen->write(Fits<VirtualRegister, __size>::convert(src));
	return true;
    }
    return false;
}

where Fits::convert(VirtualRegister) will trivially encode the VirtualRegister into the target type. Specifically the mapping is nicely summed up in the following comment

// Narrow:
// -128..-1  local variables
//    0..15  arguments
//   16..127 constants
//
// Wide16:
// -2**15..-1  local variables
//      0..64  arguments
//     64..2**15-1 constants

You may have noticed that the Variable returned by BytecodeGenerator::variableForLocalEntry already has been initialized with the virtual register offset we set when inserting the SymbolTableEntry for the local variable. And yet we use registerFor to look up the RegisterID for the local and then use the offset of the VirtualRegister contained therein. Surely those are the same? Oh well, something for a runtime assert to check.

Variables with values

Whew! Quite the detour there. Time to get back to our original snippet:

Operands<Optional<JSValue>> mustHandleValues(codeBlock->numParameters(), numVarsWithValues);
int localsUsedForCalleeSaves = static_cast<int>(CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters());
for (size_t i = 0; i < mustHandleValues.size(); ++i) {
    int operand = mustHandleValues.operandForIndex(i);
    if (operandIsLocal(operand) && VirtualRegister(operand).toLocal() < localsUsedForCalleeSaves)
	continue;
    mustHandleValues[i] = callFrame->uncheckedR(operand).jsValue();
}

What are those numVarsWithValues then? Well, the definition is right before our snippet:

unsigned numVarsWithValues;
if (bytecodeIndex)
    numVarsWithValues = codeBlock->numCalleeLocals();
else
    numVarsWithValues = 0;

OK, so this looks straighforward for a change. If the bytecodeIndex is not zero, we’re doing the tier up from JIT to DFG in the body of a function (i.e. at a loop entry). In that case, we consider all our callee locals to have values. Conversely, when we’re running for the function entry (i.e. bytecodeIndex == 0), none of the callee locals are live yet. Do note that the variable is incorrectly named. Vars are not the same as callee locals; we’re dealing with the latter here.

A second gotcha is that, whereas vars are always live, temporaries might not be. The DFG compiler will do liveness analysis at compile time to make sure it’s only looking at live values. That must have been a fun bug to track down!

Values that must be handled

Back to our snippet, numVarsWithValues is used as an argument to the constructor of mustHandleValues which is of type Operands<Optional<JSValue>>. Right, so what are the Operands? They simply hold a number of T objects (here T is Optional<JSValue>) of which the first m_numArguments correspond to, well, arguments whereas the remaining correspond to locals.

What we’re doing here is recording all the live (non-heap, obviously) values when we try to do the tier up. The idea is to be able to mix those values in with the previously observed values that DFG’s Control Flow Analysis will use to emit code which will bail us out of the optimized version (i.e. do a tier down). According to the comments and commit logs, this is in order to increase the chances of a successful OSR entry (tier up), even if the resulting optimized code may be slightly less conservative.

Remember that the optimized code that we tier up to makes assumptions with regard to the types of the incoming values (based on what we’ve observed when executing at lower tiers) and wil bail out if those assumptions are not met. Taking the values of the current execution at the time of the tier up attempt ensures we won’t be doing all this work only to immediately have to tier down again.

Operands provides an operandForIndex method which will directly give you a virtual reg for every kind of element. For example, if you had called Operands<T> opnds(2, 1), then the first iteration of the loop would give you

operandForIndex(0)
-> VirtualRegisterForargument(0).offset()
  -> VirtualRegister(argumentToOperand(0)).offset()
    -> VirtualRegister(CallFrame::thisArgumentOffset).offset()
      -> CallFrame::thisArgumentOffset

The second iteration would similarly give you CallFrame::thisArgumentOffset + 1.

In the third iteration, we’re now dealing with a local, so we’d get

operandForIndex(2)
-> virtualRegisterForLocal(2 - 2).offset()
  -> VirtualRegister(localToOperand(0)).offset()
    -> VirtualRegister(-1).offset()
      -> -1

Callee save space as virtual registers

So, finally, what is our snippet doing here? It’s iterating over the values that are likely to be live at this program point and storing them in mustHandleValues. It will first iterate over the arguments (if any) and then over the locals. However, it will use the “operand” (remember, everything is an int…) to get the index of the respective local and then skip the first locals up to localsUsedForCalleeSaves. So, in fact, even though we allocated space for (arguments + callee locals), we skip some slots and only store (arguments + callee locals - localsUsedForCalleeSaves). This is OK, as the Optional<JSValue> values in the Operands will have been initialized by the default constructor of Optional<> which gives us an object without a value (i.e. an object that will later be ignored).

Here, callee-saved register (csr) refers to a register that is available for use to the LLInt and/or the baseline JIT. This is described a bit in LowLevelInterpreter.asm, but is more apparent when one looks at what csr sets are used on each platform (or, in C++).

platform	`metadataTable`	PC-base (`PB`)	`numberTag`	`notCellMask`
`X86_64`	csr1	csr2	csr3	csr4
`x86_64_win`	csr3	csr4	csr5	csr6
`ARM64~/~ARM64E`	csr6	csr7	csr8	csr9
`C_LOOP` 64b	csr0	csr1	csr2	csr3
`C_LOOP` 32b	csr3	-	-	-
`ARMv7`	csr0	-	-	-
`MIPS`	csr0	-	-	-
`X86`	-	-	-	-

On 64-bit platforms, offlineasm (JSC’s portable assembler) makes a range of callee-saved registers available to .asm files. Those are properly saved and restored. For example, for X86_64 on non-Windows platforms, the returned RegisterSet contains registers r12-r15 (inclusive), i.e. the callee-saved registers as defined in the System V AMD64 ABI. The mapping from symbolic names to architecture registers can be found in GPRInfo.

On 32-bit platforms, the assembler doesn’t make any csr regs available, so there’s nothing to save except if the platform makes special use of some register (like C_LOOP does for the metadataTable ⁴).

What are the numberTag and notCellMask registers? Out of scope, that’s what they are!

Conclusion

Well, that wraps it up. Hopefully now you have a better understanding of what the original snippet does. In the process, we learned about a few concepts by reading through the source and, importantly, we added lots of links to JSC’s source code. This way, not only can you check that the textual explanations are still valid when you read this blog post, you can use the links as spring boards for further source code exploration to your heart’s delight!

Footnotes

¹ Both the interpreter – better known as LLInt – and the baseline JIT keep track of execution statistics, so that JSC can make informed decisions on when to tier up.

² Remarkably, no RegisterID has been allocated at this point – we used the size of m_calleeLocals but never modified it. Instead, later in the function (after adding the new local to the symbol table!) the code will call addVar which will allocate a new “anonymous” local. But then the code asserts that the index of the newly allocated local (i.e. the offset of the virtual register it contains) is the same as the offset we previously used to create the virtual register, so it’s all good.

³ How did we know to look for the ResolveNode? Well, the emitBytecode method needs to be implemented by subclasses of ExpressionNode. If we look at how a simple binary expression is parsed (and given that ASTBuilder defines BinaryOperand as std::pair<ExpressionNode*, BinaryOpInfo>), it’s clear that any variable reference has already been lifted to an ExpressionNode.

So instead, we take the bottom up approach. We find the lexer/parser token definitions, one of which is the IDENT token. Then it’s simply a matter of going over its uses in Parser.cpp, until we find our smoking gun. This gets us into createResolve aaaaand

return new (m_parserArena) ResolveNode(location, ident, start);

That’s the node we’re looking for!

⁴ C_LOOP is a special backend for JSC’s portable assembler. What is special about it is that it generates C++ code, so that it can be used on otherwise unsupported architectures. Remember that the portable assembler (offlineasm) runs at compilation time.