A Dive Into JavaScriptCore
Recently, the compiler team at Igalia was discussing the available resources for the WebKit project, both for the purpose of onboarding new Igalians and for lowering the bar for third-party contributors. As compiler people, we are mainly concerned with JavaScriptCore (JSC), WebKit’s javascript engine implementation. There are many high quality blog posts on the webkit blog that describe various phases in the evolution of JSC, but finding one’s bearings in the actual source can be a daunting task.
The aim of this post is twofold: first, document some aspects of JavaScriptCore at the source level; second, show how one can figure out what a piece of code actually does in a large and complex source base (which JSC’s certainly is).
In medias res
As an exercise, we’re going to arbitrarily use a commit I had open in a web browser tab. Specifically, we will be looking at this snippet:
Operands<Optional<JSValue>> mustHandleValues(codeBlock->numParameters(), numVarsWithValues);
int localsUsedForCalleeSaves = static_cast<int>(CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters());
for (size_t i = 0; i < mustHandleValues.size(); ++i) {
int operand = mustHandleValues.operandForIndex(i);
if (operandIsLocal(operand) && VirtualRegister(operand).toLocal() < localsUsedForCalleeSaves)
continue;
mustHandleValues[i] = callFrame->uncheckedR(operand).jsValue();
}
This seems like a good starting point for taking a dive into the low-level details of JSC internals. Virtual registers look like a concept that’s good to know about. And what are those “locals used for callee saves” anyway? How do locals differ from vars? What are “vars with values”? Let’s find out!
Backstory
Recall that JSC is a multi-tiered execution engine. Most Javascript code is only executed once; compiling takes longer than simply interpreting the code, so Javascript code is always interpreted the first time through. If it turns out that a piece of code is executed frequently though1, compiling it becomes a more attractive proposition.
Initially, the tier up happens to the baseline JIT, a simple and fast non-optimizing compiler that produces native code for a Javascript function. If the code continues to see much use, it will be recompiled with DFG, an optimizing compiler that is geared towards low compilation times and decent performance of the produced native code. Eventually, the code might end up being compiled with the FTL backend too, but the upper tiers won’t be making an appearence in our story here.
What do tier up and tier down mean? In short, tier up is when code execution switches to a more optimized version, whereas tier down is the reverse operation. So the code might tier up from the interpreter to the baseline JIT, but later tier down (under conditions we’ll briefly touch on later) back to the baseline JIT. You can read a more extensive overview here.
Diving in
With this context now in place, we can revisit the snippet above. The code is part of operationOptimize
. Just looking at the two sites it’s referenced in, we can see that it’s only ever used if the DFG_JIT
option is enabled. This is where the baseline JIT ➞ DFG tier up happens!
The sites that make use of operationOptimize
both run during the generation of native code by the baseline JIT. The first one runs in response to the op_enter
bytecode opcode, i.e. the opcode that marks entry to the function. The second one runs when encountering an op_loop_hint
opcode (an opcode that only appears at the beginning of a basic block marking the entry to a loop). Those are the two kinds of program points at which execution might tier up to the DFG.
Notice that calls to operationOptimize
only occur during execution of the native code produced by the baseline JIT. In fact, if you look at the emitted code surrounding the call to operationOptimize
for the function entry case, you’ll see that the call is conditional and only happens if the function has been executed enough times that it’s worth making a C++ call to consider it for optimization.
The function accepts two arguments: a vmPointer
which is, umm, a pointer to a VM
structure (i.e. the “state of the world” as far as this function is concerned) and the bytecodeIndex
. Remember that the bytecode is the intermediate representation (IR) that all higher tiers start compiling from. In operationOptimize
, the bytecodeIndex
is used for
- distinguishing between function and loop entry points
- the DFG to be able to do program analysis of the values at the respective program point
- various diagnostics.
Again, the bytecodeIndex
is a parameter that has already been set in stone during generation of the native code by the baseline JIT.
The other parameter, the VM
, is used in a number of things. The part that’s relevant to the snippet we started out to understand is that the VM
is (sometimes) used to give us access to the current CallFrame
. CallFrame
inherits from Register
, which is a thin wrapper around a (maximally) 64-bit value.
The CodeBlock
In this case, the various accessors defined by CallFrame
effectively treat the (pointer) value that CallFrame
consists of as a pointer to an array of Register
values. Specifically, a set of constant expressions
struct CallFrameSlot {
static constexpr int codeBlock = CallerFrameAndPC::sizeInRegisters;
static constexpr int callee = codeBlock + 1;
static constexpr int argumentCount = callee + 1;
static constexpr int thisArgument = argumentCount + 1;
static constexpr int firstArgument = thisArgument + 1;
};
give the offset (relative to the callframe) of the pointer to the codeblock, the callee, the argument count and the this
pointer. Note that the first CallFrameSlot
is the CallerFrameAndPC
, i.e. a pointer to the CallFrame
of the caller and the returnPC
.
The CodeBlock
is definitely something we’ll need to understand better, as it appears in our motivational code snippet. However, it’s a large class that is intertwined with a number of other interesting code paths. For the purposes of this discussion, we need to know that it
- is associated with a code block (i.e. a function, eval, program or module code block)
- holds data relevant to tier up/down decisions and operations for the associated code block
We’ll focus on three of its data members:
int m_numCalleeLocals;
int m_numVars;
int m_numParameters;
So, it seems that a CodeBlock
can have at least some parameters (makes sense, right?) but also has both variables and callee locals.
First things first: what’s the difference between callee locals and vars? Well, it turns out that m_numCalleeLocals
is only incremented in BytecodeGeneratorBase<Traits>::newRegister
whereas m_numVars
is only incremented in BytecodeGeneratorBase<Traits>::addVar()
. Except, addVar
calls into newRegister
, so vars are a subset of callee locals (and therefore m_numVars
≤ m_numCalleelocals
).
Somewhat surprisingly, newRegister
is only called in 3 places:
So there you have it. Callee locals
- are allocated by a function called
newRegister
- are either a var or a temporary.
Let’s start with the second point. What is a var? Well, let’s look at where vars are created (via addVar
):
There is definitely a var for every lexical variable (VarKind::Stack
), i.e. a non-local variable accessible from the current scope. Vars are also generated (via BytecodeGenerator::createVariable
) for
- the
arguments
object, if needed - function definitions in scope
- declared function variables
- declared module variables
- the module ‘meta’ private variable
So, intuitively, vars are allocated more or less for “every JS construct that could be called a variable”. Conversely, temporaries are storage locations that have been allocated as part of bytecode generation (i.e. there is no corresponding storage location in the JS source). They can store intermediate calculation results and what not.
Coming back to the first point regarding callee locals, how come they’re allocated by a function called newRegister
? Why, because JSC’s bytecode operates on a register VM! The RegisterID
returned by newRegister
wraps the VirtualRegister
that our register VM is all about.
Virtual registers, locals and arguments, oh my!
A virtual register (of type VirtualRegister
) consists simply of an int
(which is also called its offset). Each virtual register corresponds to one of
- local (i.e. variable or temporary)
- argument
- reference to a constant in the constant pool
- reference to a field in the header of a
CallFrame
(caller frame, return address, argument count, callee, code block)
There is no differentiation between locals and arguments at the type level (everything is a (positive) int
); However, virtual registers that map to locals are negative and those that map to arguments are nonnegative. In the context of bytecode generation, the int
- for a local indexes into
m_calleeLocals
- for an argument indexes into
m_parameters
- for a constant, it indexes into
m_constantRegisters
It feels like JSC is underusing C++ here.
In all cases, what we get after indexing with a local, argument or constant is a RegisterID
. As explained, the RegisterID
wraps a VirtualRegister
. Why do we need this indirection?
Well, there are two extra bits of info in the RegisterID
. The m_refcount
and an m_isTemporary
flag. The reference count is always greater than zero for a variable, but the rules under which a RegisterID
is ref’d and unref’d are too complicated to go into here.
When you have an argument, you get the VirtualRegister
for it by directly adding it to CallFrame::thisArgumentoffset
.
When you have a local, you map it to (-1 - local)
to get the corresponding Virtualregister
. So
local | vreg |
---|---|
0 | -1 |
1 | -2 |
2 | -3 |
(remember, virtual registers that correspond to locals are negative).
For an argument, you map it to (arg + CallFrame::thisArgumentOffset())
:
argument | vreg |
---|---|
0 | this |
1 | this + 1 |
2 | this + 2 |
Which makes all the sense in the world when you remember what the CallFrameSlot
looks like. So argument 0 is always the `this` pointer.
If the vreg is greater than some large offset (s_firstConstantRegisterIndex
), then it is an index into the CodeBlock
's constant pool (after subtracting the offset).
Bytecode operands
If you’ve followed any of the links to the functions doing the actual mapping of locals and arguments to a virtual register, you may have noticed that the functions are called localToOperand
and argumentToOperand
. Yet they’re only ever used in virtualRegisterForLocal
and virtualRegisterForArgument
respectively. This raises the obvious question: what are those virtual registers operands of?
Well, of the bytecode instructions in our register VM of course. Instead of recreating the pictures, I’ll simply encourage you to take a look at a recent blog post describing it at a high level.
How do we know that’s what “operand” refers to? Well, let’s look at a use of virtualRegisterForLocal
in the bytecode generator. BytecodeGenerator::createVariable
will allocate2 the next available local index (using the size of m_calleeLocals
to keep track of it). This calls into virtualRegisterForLocal
, which maps the local
to a virtual register by calling localToOperand
.
The newly allocated local is inserted into the function symbol table, along with its offset (i.e. the ID of the virtual register).
The SymbolTableEntry
is looked up when we generate bytecode for a variable reference. A variable reference is represented by a ResolveNode
3.
So looking into ResolveNode::emitBytecode
, we dive into BytecodeGenerator::variable
and there’s our symbolTable->get()
call. And then the symbolTableEntry
is passed to BytecodeGenerator::variableForLocalEntry
which uses entry.varOffset()
to initialize the returned Variable
with offset
. It also uses registerFor
to retrieve the RegisterID
from m_calleeLocals
.
ResolveNode::emitBytecode
will then pass the local
RegisterID
to move
which calls into emitMove
, which just calls OpMov::emit
(a function generated by the JavaScriptCore/generator
code). Note that the compiler implicitly converts the RegisterID
arguments to VirtualRegister
type at this step. Eventually, we end up in the (generated) function
template<OpcodeSize __size, bool recordOpcode, typename BytecodeGenerator>
static bool emitImpl(BytecodeGenerator* gen, VirtualRegister dst, VirtualRegister src)
{
if (__size == OpcodeSize::Wide16)
gen->alignWideOpcode16();
else if (__size == OpcodeSize::Wide32)
gen->alignWideOpcode32();
if (checkImpl<__size>(gen, dst, src)) {
if (recordOpcode)
gen->recordOpcode(opcodeID);
if (__size == OpcodeSize::Wide16)
gen->write(Fits<OpcodeID, OpcodeSize::Narrow>::convert(op_wide16));
else if (__size == OpcodeSize::Wide32)
gen->write(Fits<OpcodeID, OpcodeSize::Narrow>::convert(op_wide32));
gen->write(Fits<OpcodeID, __size>::convert(opcodeID));
gen->write(Fits<VirtualRegister, __size>::convert(dst));
gen->write(Fits<VirtualRegister, __size>::convert(src));
return true;
}
return false;
}
where Fits::convert(VirtualRegister)
will trivially encode the VirtualRegister into the target type. Specifically the mapping is nicely summed up in the following comment
// Narrow:
// -128..-1 local variables
// 0..15 arguments
// 16..127 constants
//
// Wide16:
// -2**15..-1 local variables
// 0..64 arguments
// 64..2**15-1 constants
You may have noticed that the Variable
returned by BytecodeGenerator::variableForLocalEntry
already has been initialized with the virtual register offset
we set when inserting the SymbolTableEntry
for the local variable. And yet we use registerFor
to look up the RegisterID
for the local and then use the offset of the VirtualRegister
contained therein. Surely those are the same? Oh well, something for a runtime assert to check.
Variables with values
Whew! Quite the detour there. Time to get back to our original snippet:
Operands<Optional<JSValue>> mustHandleValues(codeBlock->numParameters(), numVarsWithValues);
int localsUsedForCalleeSaves = static_cast<int>(CodeBlock::llintBaselineCalleeSaveSpaceAsVirtualRegisters());
for (size_t i = 0; i < mustHandleValues.size(); ++i) {
int operand = mustHandleValues.operandForIndex(i);
if (operandIsLocal(operand) && VirtualRegister(operand).toLocal() < localsUsedForCalleeSaves)
continue;
mustHandleValues[i] = callFrame->uncheckedR(operand).jsValue();
}
What are those numVarsWithValues
then? Well, the definition is right before our snippet:
unsigned numVarsWithValues;
if (bytecodeIndex)
numVarsWithValues = codeBlock->numCalleeLocals();
else
numVarsWithValues = 0;
OK, so this looks straighforward for a change. If the bytecodeIndex
is not zero, we’re doing the tier up from JIT to DFG in the body of a function (i.e. at a loop entry). In that case, we consider all our callee locals to have values. Conversely, when we’re running for the function entry (i.e. bytecodeIndex == 0
), none of the callee locals are live yet. Do note that the variable is incorrectly named. Vars are not the same as callee locals; we’re dealing with the latter here.
A second gotcha is that, whereas vars are always live, temporaries might not be. The DFG compiler will do liveness analysis at compile time to make sure it’s only looking at live values. That must have been a fun bug to track down!
Values that must be handled
Back to our snippet, numVarsWithValues
is used as an argument to the constructor of mustHandleValues
which is of type Operands<Optional<JSValue>>
. Right, so what are the Operands
? They simply hold a number of T
objects (here T
is Optional<JSValue>
) of which the first m_numArguments
correspond to, well, arguments whereas the remaining correspond to locals.
What we’re doing here is recording all the live (non-heap, obviously) values when we try to do the tier up. The idea is to be able to mix those values in with the previously observed values that DFG’s Control Flow Analysis will use to emit code which will bail us out of the optimized version (i.e. do a tier down). According to the comments and commit logs, this is in order to increase the chances of a successful OSR entry (tier up), even if the resulting optimized code may be slightly less conservative.
Remember that the optimized code that we tier up to makes assumptions with regard to the types of the incoming values (based on what we’ve observed when executing at lower tiers) and wil bail out if those assumptions are not met. Taking the values of the current execution at the time of the tier up attempt ensures we won’t be doing all this work only to immediately have to tier down again.
Operands
provides an operandForIndex
method which will directly give you a virtual reg for every kind of element. For example, if you had called Operands<T> opnds(2, 1)
, then the first iteration of the loop would give you
operandForIndex(0)
-> VirtualRegisterForargument(0).offset()
-> VirtualRegister(argumentToOperand(0)).offset()
-> VirtualRegister(CallFrame::thisArgumentOffset).offset()
-> CallFrame::thisArgumentOffset
The second iteration would similarly give you CallFrame::thisArgumentOffset + 1
.
In the third iteration, we’re now dealing with a local, so we’d get
operandForIndex(2)
-> virtualRegisterForLocal(2 - 2).offset()
-> VirtualRegister(localToOperand(0)).offset()
-> VirtualRegister(-1).offset()
-> -1
Callee save space as virtual registers
So, finally, what is our snippet doing here? It’s iterating over the values that are likely to be live at this program point and storing them in mustHandleValues
. It will first iterate over the arguments (if any) and then over the locals. However, it will use the “operand” (remember, everything is an int…) to get the index of the respective local and then skip the first locals up to localsUsedForCalleeSaves
. So, in fact, even though we allocated space for (arguments + callee locals), we skip some slots and only store (arguments + callee locals - localsUsedForCalleeSaves
). This is OK, as the Optional<JSValue>
values in the Operands
will have been initialized by the default constructor of Optional<>
which gives us an object without a value (i.e. an object that will later be ignored).
Here, callee-saved register (csr
) refers to a register that is available for use to the LLInt and/or the baseline JIT. This is described a bit in LowLevelInterpreter.asm
, but is more apparent when one looks at what csr
sets are used on each platform (or, in C++).
platform | metadataTable |
PC-base (PB ) |
numberTag |
notCellMask |
---|---|---|---|---|
X86_64 |
csr1 | csr2 | csr3 | csr4 |
x86_64_win |
csr3 | csr4 | csr5 | csr6 |
ARM64~/~ARM64E |
csr6 | csr7 | csr8 | csr9 |
C_LOOP 64b |
csr0 | csr1 | csr2 | csr3 |
C_LOOP 32b |
csr3 | - | - | - |
ARMv7 |
csr0 | - | - | - |
MIPS |
csr0 | - | - | - |
X86 |
- | - | - | - |
On 64-bit platforms, offlineasm (JSC’s portable assembler) makes a range of callee-saved registers available to .asm
files. Those are properly saved and restored. For example, for X86_64
on non-Windows platforms, the returned RegisterSet
contains registers r12
-r15
(inclusive), i.e. the callee-saved registers as defined in the System V AMD64 ABI. The mapping from symbolic names to architecture registers can be found in GPRInfo
.
On 32-bit platforms, the assembler doesn’t make any csr
regs available, so there’s nothing to save except if the platform makes special use of some register (like C_LOOP
does for the metadataTable
4).
What are the numberTag
and notCellMask
registers? Out of scope, that’s what they are!
Conclusion
Well, that wraps it up. Hopefully now you have a better understanding of what the original snippet does. In the process, we learned about a few concepts by reading through the source and, importantly, we added lots of links to JSC’s source code. This way, not only can you check that the textual explanations are still valid when you read this blog post, you can use the links as spring boards for further source code exploration to your heart’s delight!
Footnotes
1 Both the interpreter – better known as LLInt – and the baseline JIT keep track of execution statistics, so that JSC can make informed decisions on when to tier up.
2 Remarkably, no RegisterID
has been allocated at this point – we used the size of m_calleeLocals
but never modified it. Instead, later in the function (after adding the new local to the symbol table!) the code will call addVar
which will allocate a new “anonymous” local. But then the code asserts that the index of the newly allocated local (i.e. the offset of the virtual register it contains) is the same as the offset we previously used to create the virtual register, so it’s all good.
3 How did we know to look for the ResolveNode
? Well, the emitBytecode
method needs to be implemented by subclasses of ExpressionNode
. If we look at how a simple binary expression is parsed (and given that ASTBuilder
defines BinaryOperand
as std::pair<ExpressionNode*, BinaryOpInfo>
), it’s clear that any variable reference has already been lifted to an ExpressionNode
.
So instead, we take the bottom up approach. We find the lexer/parser token definitions, one of which is the IDENT
token. Then it’s simply a matter of going over its uses in Parser.cpp
, until we find our smoking gun. This gets us into createResolve
aaaaand
return new (m_parserArena) ResolveNode(location, ident, start);
That’s the node we’re looking for!
4 C_LOOP
is a special backend for JSC’s portable assembler. What is special about it is that it generates C++ code, so that it can be used on otherwise unsupported architectures. Remember that the portable assembler (offlineasm
) runs at compilation time.