05_指令集和解释器

2022-02-212023-04-03编程语言 / Java1 小时读完 (大约7859个字)0次访问

指令集和解释器

Java虚拟机顾名思义，就是一台虚拟的机器，而字节码（bytecode）就是运行在这台虚拟机器上的机器码。

一、字节码指令

1.1 指令结构

字节码中存放编码后的 Java 虚拟机指令：

每条指令都以一个单字节的操作码（opcode）开头。
由于只使用一字节表示操作码，Java 虚拟机最多只能支持 256 条指令。

到第八版为止，Java 虚拟机规范已经定义了205条指令，操作码分别是0(0x00)到 202(0xCA)、254(0xFE)和 255(0xFF)。

Java虚拟机使用的是变长指令，操作码后面可以跟零字节或多字节的操作数（operand）。

比如 0xB20002 这条指令，B2 表示该指令的操作码，0002 就表示操作数。

1.2 指令助记符

为了便于记忆，Java虚拟机规范给每个操作码都指定了一个助记符（mnemonic）。

比如，操作码是 0x00 的助记符是 nop（no operation）。

操作数栈和局部变量表只存放数据的值（Slot），并不记录数据类型。
所以指令必须知道自己在操作什么类型的数据，即指令绑定了数据类型

例如，iadd 指令就是对 int 值进行加法操作；dstore 指令把操作数栈顶的double值弹出，存储到局部变量表中；areturn 从方法中返回引用值。

1.3 指令类型

Java 虚拟机规范把已经定义的205条指令按用途分成了11类，分别是：

常量（constants）指令
加载（loads）指令
存储（stores）指令
操作数栈（stack）指令
数学（math）指令
转换（conversions）指令
比较（comparisons）指令
控制（control）指令
引用（references）指令
扩展（extended）指令
保留（reserved）指令

保留指令一共有3条。

其中1条是留给调试器的，用于实现断点，操作码是 202(0xCA)，助记符是 breakpoint；
另外2条留给 Java 虚拟机实现内部使用，操作码分别是 254(0xFE) 和 266(0xFF)，助记符是 impdep1 和 impdep2。

这3条保留指令不允许出现在class文件中。

二、指令运行

2.1 指令循环

虚拟机的运行过程，就是循环执行指令的过程，伪代码大致是这样的：

do {
    atomically calculate pc and fetch opcode at pc;
    if (operands) fetch operands;
    execute the action for the opcode;
} while (there is more to do);

每次循环都包含三个部分：

计算pc
指令解码
指令执行

上面的伪代码转成 java 代码的话，大概是这样的：

while (true) {
    // 计算PC
    pc = calculatePC()
    // 指令解码
    opcode = bytecode[pc]
    inst = createInst(opcode)
    inst.fetchOperands(bytecode)
    // 指令执行
    inst.execute()
}

2.2 指令接口

根据上述的伪代码，创建指令接口 Instruction，作为所有指令实现的基本接口：

public interface Instruction {

    void execute(Frame frame);

    void fetchOperands(ByteCodeReader reader);

}

接口包括2个方法：取数（fetchOperands）和执行（execute）。

2.2 指令解码

接下来就是怎么把字节码解析成指令接口了。

因为需要操作字节，所以定义字节码读取类 ByteCodeReader，用于读取字节码：

public class ByteCodeReader {
    /** 字节码字节数组 */
    private byte[] bytes;
    /** 字节缓冲读取 */
    private ByteBuffer buf;

    public ByteCodeReader(byte[] bytes) {
        reset(bytes, 0);
    }

    public void reset(byte[] bytes, int position) {
        this.bytes = bytes;
        buf = ByteBuffer.wrap(bytes);
        buf.order(ByteOrder.BIG_ENDIAN); // 大端
        buf.position(position);
    }

    public int getPosition() {
        return buf.position();
    }

    public byte readByte() {
        return buf.get();
    }

    public int readShort() {
        return buf.getShort();
    }

    public int readInt() {
        return buf.getInt();
    }

    public int readInt8() {
        return readByte();
    }

    public Uint8 readUint8() {
        byte b = buf.get();
        int val = 0x0FF & b;
        return new Uint8(val);
    }

    public int readInt16() {
        return readShort();
    }

    public Uint16 readUint16() {
        short s = buf.getShort();
        int val = 0x0FFFF & s;
        return new Uint16(val);
    }

    public Uint32 readUint32() {
        int i = buf.getInt();
        long val = 0x0FFFFFFFFL & i;
        return new Uint32(val);
    }

    public byte[] readBytes(Uint32 length) {
        int len = (int) length.value();
        byte[] bytes = new byte[len];
        buf.get(bytes);
        return bytes;
    }

    public void skipPadding() {
        while (getPosition() % 4 != 0) {
            readUint8();
        }
    }
}

bytes 存放原始的字节码字节数组，buf 是 java 中的 ByteBuffer 类对象，可以对字节进行操作。

因为有些数据是占用不止1个机器字的，所以需要定义字节数组是大端，还是小端，这里设定是大端。

getPosition() 可用于获取当前读到的位置，这个主要是用于后面程序计数器 PC 的定位。

剩余的则是读取不同数据格式的方法，和 classfile 的读取差不多。

三、指令集

3.1 抽象指令基类

接口包括2个方法：取数（fetchOperands）和执行（execute）。

很多指令都具有相同的操作数类型，所以定义一些基类来方便实现。

有些指令是没有操作数的，定义一个无操作数基类 NoOperandsInstruction：

public class NoOperandsInstruction implements Instruction {

    @Override
    public void execute(Frame frame) {
        // 什么也不做
    }

    @Override
    public void fetchOperands(ByteCodeReader reader) {
        // 什么也不做
    }

}

有些指令需要访问局部变量表，局部变量表索引是一个8位无符号整数，定义一个基类 Index8Instruction：

public class Index8Instruction implements Instruction {

    protected int index;
    protected Uint8 source;

    @Override
    public void execute(Frame frame) {
    }

    @Override
    public void fetchOperands(ByteCodeReader reader) {
        source = reader.readUint8();
        index = source.value();
    }
}

有些指令需要访问常量池，常量池索引是一个16位无符号整数，定义一个基类 Index16Instruction：

public class Index16Instruction implements Instruction {

    protected int index;
    protected Uint16 source;

    @Override
    public void execute(Frame frame) {
    }

    @Override
    public void fetchOperands(ByteCodeReader reader) {
        source = reader.readUint16();
        index = source.value();
    }
}

还有跳转指令，它的操作数是16位无符号整数，定义基类 BranchInstruction：

public class BranchInstruction implements Instruction {

    /** 16位有符号整数偏移 */
    protected int offset;

    @Override
    public void execute(Frame frame) {
    }

    @Override
    public void fetchOperands(ByteCodeReader reader) {
        // 注意是16位有符号整数
        offset = reader.readShort();
    }

    /**
     * 指令跳转
     * @param frame 栈帧
     */
    protected void branch(Frame frame) {
        int pc = frame.getThread().getPc();
        frame.setNextPc(pc + offset);
    }

}

3.2 常量（constants）指令

常量指令，把常量推入操作数栈顶。

常量的来源有3个：

隐含在操作码
操作数
运行时常量池

下面是这几种常量来源的具体实现。

3.2.1 隐含在操作码的常量

所谓的隐含在操作码里，实际上指令绑定了常量，在助记符里就能看出来常量值。

比如指令 iconst_3，就是整数常量3；iconst_m1 就是整数常量-1；dconst_1 就是双精度浮点数1。

这种隐含在操作码的指定，有15条：

aconst_null
iconst_m1
iconst_0
iconst_1
iconst_2
iconst_3
iconst_4
iconst_5
lconst_0
lconst_1
fconst_0
fconst_1
fconst_2
dconst_0
dconst_1

指令这些隐藏常量，是因为这些常量比较常用，懒得浪费1个字节去额外存储。

aconst_null 指令把 null 引用推入操作数栈顶：

public class AConstNull extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        frame.getOpStack().pushRef(null);
    }

}

iconst_m1 指令把 int 型 -1 推入操作数栈顶：

public class IConstM1 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        frame.getOpStack().pushInt(-1);
    }

}

dconst_0 指令把 double 型 0 推入操作数栈顶：

public class DConst0 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        frame.getOpStack().pushDouble(0.0);
    }

}

其余常量指令代码都差不多，只是值不同，不再列出。

3.2.2 操作数常量

有2个指令，是把操作数当作常量，放入操作数栈中：

1 2	bipush sipush

bipush 是从操作数中读取一个 byte 整数，放入操作数栈中：

public class BIPush implements Instruction {

    private int val;

    @Override
    public void execute(Frame frame) {
        frame.getOpStack().pushInt(val);
    }

    @Override
    public void fetchOperands(ByteCodeReader reader) {
        // 注意是8位有符号整数
        val = reader.readInt8();
    }
}

sipush 是从操作数中读取一个 short 整数，放入操作数栈中；

public class SIPush implements Instruction {

    private int val;

    @Override
    public void execute(Frame frame) {
        frame.getOpStack().pushInt(val);
    }

    @Override
    public void fetchOperands(ByteCodeReader reader) {
        // 注意是16位有符号整数
        val = reader.readInt16();
    }
}

注意，bipush 和 sipush 的操作数都是有符号整数。

3.2.3 常量池常量

还有几条指令，是从常量池中获取常量，放入操作数栈中：

1
2
3

idc
idcw
idc2w

idc 的操作数是8位无符号整数，用于获取 int、float 这些类型的常量：

public class IDC extends Index8Instruction {

    @Override
    public void execute(Frame frame) {
        OperandStack stack = frame.getOpStack();
        ConstantPool cp = frame.getMethod().getClazz().getConstantPool();
        Constant constant = cp.getConstant(index);
        if (constant instanceof IntegerConstant) {
            stack.pushInt(((IntegerConstant) constant).value());
        } else if (constant instanceof FloatConstant) {
            stack.pushFloat(((FloatConstant) constant).value());
        } else {
            System.out.println("Unsupported Type: " + constant);
        }
    }
}

idcw 的操作数是16位无符号整数，用于获取 int、float 这些类型的常量：

public class IDCW extends Index16Instruction {

    @Override
    public void execute(Frame frame) {
        OperandStack stack = frame.getOpStack();
        ConstantPool cp = frame.getMethod().getClazz().getConstantPool();
        Constant constant = cp.getConstant(index);
        if (constant instanceof IntegerConstant) {
            stack.pushInt(((IntegerConstant) constant).value());
        } else if (constant instanceof FloatConstant) {
            stack.pushFloat(((FloatConstant) constant).value());
        } else {
            System.out.println("Unsupported Type: " + constant);
        }
    }
}

idc2w 的操作数是16位无符号整数，用于获取 long、double 类型的常量：

public class IDC2W extends Index16Instruction {

    @Override
    public void execute(Frame frame) {
        OperandStack stack = frame.getOpStack();
        ConstantPool cp = frame.getMethod().getClazz().getConstantPool();
        Constant constant = cp.getConstant(index);
        if (constant instanceof LongConstant) {
            stack.pushLong(((LongConstant) constant).value());
        } else if (constant instanceof DoubleConstant) {
            stack.pushDouble(((DoubleConstant) constant).value());
        } else {
            throw new ClassFormatError("Constant: " + constant);
        }
    }
}

idc、idcw 的作用差不多，只是操作数的范围不一样。

idc2w 是专门用于 long、double 这种双字类型的指令。

3.3 加载（loads）指令

加载指令，负责从局部变量表获取变量，然后推入操作数栈顶。

加载指令共33条。

按照操作数类型分的话，可以分为6种：

aload 系列指令：操作引用类型变量
dload 系列指令：操作 double 类型变量
fload 系列指令：操作 float 变量
iload 系列指令：操作 int 变量
lload 系列指令：操作 long 变量
xaload 系列指令：操作数组变量

实际上各个加载指令都差不多，只是操作数的类型不同。

从局部变量表中获取变量，需要指定变量索引，索引的来源有2个：

隐含在操作码中
操作数

这2种来源和前面的常量指令差不多。

这里给几个例子：

iload_1 指令，把局部变量表中的 1 号整型变量，推入操作数栈中：

public class ILoad1 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        frame.getOpStack().pushInt(frame.getLocalVars().getInt(1));
    }

}

dload_2 指令，把局部变量表中的 2 号 double 变量，推入操作数栈中：

public class DLoad2 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        frame.getOpStack().pushDouble(frame.getLocalVars().getDouble(2));
    }

}

lload 指令，从操作数中获取索引 index，根据索引去局部变量表中加载第 index 号的 long 变量，推入操作数栈中：

public class LLoad extends Index8Instruction {

    @Override
    public void execute(Frame frame) {
        // index 是 Index8Instruction 读取的8位无符号整数
        frame.getOpStack().pushLong(frame.getLocalVars().getLong(index));
    }

}

类似的指令还有 iload、fload、dload、aload 等。

需要注意，long、double 类型实际上是会占用局部变量表和操作数栈的2个插槽 Slot 的，之前已经封装好了。

3.4 存储（stores）指令

存储指令，负责从操作数栈中弹出变量，放入局部变量表中。

存储指令和加载指令是反过来操作的，所以指令都差不多，实现也就是反过来就行。

这里给几个例子：

astore_0 指令，从操作数栈中弹出引用变量，放入局部变量表的 0 号位置：

public class AStore0 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        frame.getLocalVars().setRef(0, frame.getOpStack().popRef());
    }

}

fstore_2 指令，从操作数栈中弹出 float 变量，放入局部变量表的 2 号位置：

public class FStore2 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        frame.getLocalVars().setFloat(2, frame.getOpStack().popFloat());
    }

}

dstore 指令，在操作数中获取索引 index，从操作数栈中弹出 double 变量，放入局部变量表的 index 号位置：

public class DStore extends Index8Instruction {

    @Override
    public void execute(Frame frame) {
        // index 是 Index8Instruction 读取的8位无符号整数
        frame.getLocalVars().setDouble(index, frame.getOpStack().popDouble());
    }

}

其余指令实现都差不多，基本是和加载指令反过来而已。

3.5 操作数栈（stack）指令

操作数栈指令，是直接对操作数栈中的数据进行操作。

共9条，包括：

弹出指令：pop 系列指令将栈顶变量弹出
复制指令：dup 系列指令复制栈顶变量
交换指令：swap 指令交换栈顶的两个变量

操作数栈指令，直接操作的是插槽 Slot，所以并不关系里面数据的类型。

因为只操作 Slot，所以需要给 OperandStack 增加2个操作 Slot 的方法：

public class OperandStack {

    public void pushSlot(Slot slot) {
        slots[size].setSlot(slot);
        size++;
    }

    public Slot popSlot() {
        size--;
        Slot slot = slots[size];
        Slot copySlot = new Slot(slot);
        slot.setSlot(null);
        return copySlot;
    }
}

这里的弹出 Slot 和推入 Slot，实际上并不是直接替换 Slot 对象，而是改变了它里面的值。

这是为了保证 Slot 对象一直存在，用于占位，避免空指针异常。

3.5.1 弹出指令

弹出指令包括 pop 和 pop2。

pop 指令用于弹出 int、float 等占用1个插槽位置的变量：

public class Pop extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        frame.getOpStack().popSlot();
    }
}

pop2 指令用于弹出 long、double 等占用2个插槽位置的变量。

public class Pop2 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        frame.getOpStack().popSlot();
        frame.getOpStack().popSlot();
    }
}

3.5.2 复制指令

复制指令，用于复制操作数栈的变量。

dup 指令是复制栈顶的单个变量：

public class Dup extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        Slot slot = frame.getOpStack().popSlot();
        frame.getOpStack().pushSlot(slot);
        frame.getOpStack().pushSlot(slot);
    }
}

dup2 指令是复制栈顶的2个变量：

public class Dup2 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        Slot slot1 = frame.getOpStack().popSlot();
        Slot slot2 = frame.getOpStack().popSlot();
        frame.getOpStack().pushSlot(slot2);
        frame.getOpStack().pushSlot(slot1);
        frame.getOpStack().pushSlot(slot2);
        frame.getOpStack().pushSlot(slot1);
    }
}

dupx1 指令是复制栈顶的单个变量，但复制变量不是推入栈顶，具体看实现：

public class DupX1 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        Slot slot1 = frame.getOpStack().popSlot();
        Slot slot2 = frame.getOpStack().popSlot();
        frame.getOpStack().pushSlot(slot1);
        frame.getOpStack().pushSlot(slot2);
        frame.getOpStack().pushSlot(slot1);
    }

}

dupx2 指令是复制栈顶的单个变量，但复制变量不是推入栈顶，具体看实现：

public class DupX2 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        Slot slot1 = frame.getOpStack().popSlot();
        Slot slot2 = frame.getOpStack().popSlot();
        Slot slot3 = frame.getOpStack().popSlot();
        frame.getOpStack().pushSlot(slot1);
        frame.getOpStack().pushSlot(slot3);
        frame.getOpStack().pushSlot(slot2);
        frame.getOpStack().pushSlot(slot1);
    }
}

dup2x1 指令是复制栈顶的2个变量，但复制变量不是推入栈顶，具体看实现：

public class Dup2X1 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        Slot slot1 = frame.getOpStack().popSlot();
        Slot slot2 = frame.getOpStack().popSlot();
        Slot slot3 = frame.getOpStack().popSlot();
        frame.getOpStack().pushSlot(slot2);
        frame.getOpStack().pushSlot(slot1);
        frame.getOpStack().pushSlot(slot3);
        frame.getOpStack().pushSlot(slot2);
        frame.getOpStack().pushSlot(slot1);
    }
}

dup2x2 指令是复制栈顶的2个变量，但复制变量不是推入栈顶，具体看实现：

public class Dup2X2 extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        Slot slot1 = frame.getOpStack().popSlot();
        Slot slot2 = frame.getOpStack().popSlot();
        Slot slot3 = frame.getOpStack().popSlot();
        Slot slot4 = frame.getOpStack().popSlot();
        frame.getOpStack().pushSlot(slot2);
        frame.getOpStack().pushSlot(slot1);
        frame.getOpStack().pushSlot(slot4);
        frame.getOpStack().pushSlot(slot3);
        frame.getOpStack().pushSlot(slot2);
        frame.getOpStack().pushSlot(slot1);
    }
}

除了 dup、dup2，其他几个指令还是挺麻烦的。

3.5.3 交换指令

交换指令，负责交换操作数栈的2个变量。

只有1条指令 swap：

public class Swap extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        Slot slot1 = frame.getOpStack().popSlot();
        Slot slot2 = frame.getOpStack().popSlot();
        frame.getOpStack().pushSlot(slot1);
        frame.getOpStack().pushSlot(slot2);
    }
}

3.6 数学（math）指令

数学指令，包括算术（加、减、乘、除）、位移、布尔、自增等基本指令。

共 37 条。

数学指令，都是先从操作数栈中弹出变量，执行数学运算后，再把结果推回操作数栈。

这里给几个例子：

算术指令，整数加法指令 idd：

public class IAdd extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        int val1 = frame.getOpStack().popInt();
        int val2 = frame.getOpStack().popInt();
        int val = val2 + val1;
        frame.getOpStack().pushInt(val);
    }
}

算术指令，double 类型减法指令 dsub：

public class DSub extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        double val1 = frame.getOpStack().popDouble();
        double val2 = frame.getOpStack().popDouble();
        double val = val2 - val1;
        frame.getOpStack().pushDouble(val);
    }
}

位移指令，int 整型左移指令 ishl：

public class IShl extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        int val1 = frame.getOpStack().popInt();
        int val2 = frame.getOpStack().popInt();
        int val = val2 << (val1 & 0x1f);
        frame.getOpStack().pushInt(val);
    }
}

位移指令，long 长整型无符号右移指令 lushr：

public class LUShr extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        // 注意位移操作数是一个int类型的
        int val1 = frame.getOpStack().popInt();
        long val2 = frame.getOpStack().popLong();
        long val = val2 >>> (val1 & 0x3f);
        frame.getOpStack().pushLong(val);
    }
}

布尔指令，int 整型按位或指令 or：

public class IOr extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        int val1 = frame.getOpStack().popInt();
        int val2 = frame.getOpStack().popInt();
        int val = val2 | val1;
        frame.getOpStack().pushInt(val);
    }
}

自增指令 iinc：

public class IInc implements Instruction {

    /**
     * 局部变量索引
     */
    private Uint8 index;
    /**
     * 常量
     */
    private int value;

    @Override
    public void execute(Frame frame) {
        int val = frame.getLocalVars().getInt(index.value());
        val += value;
        frame.getLocalVars().setInt(index.value(), val);
    }

    @Override
    public void fetchOperands(ByteCodeReader reader) {
        index = reader.readUint8();
        // 注意这是一个8位的有符号整数
        value = reader.readInt8();
    }
}

数学指令还是比较简单的，不过像自增指令 iinc 就需要特别注意操作数类型。

3.7 转换（conversions）指令

转换指令，是指对类型进行强制转换，比如 double 转 long，float 转 int 等。

共15条，这里暂时实现基本类型的转换。

引用类型的强制转换，会走 checkcast 指令，这里还没有办法实现。

这里给出几个例子：

i2x 系列指令，i2l 是 int 转 long 类型：

public class I2L extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        long l = frame.getOpStack().popInt();
        frame.getOpStack().pushLong(l);
    }
}

l2x 系列指令，l2d 是 long 转 double 类型：

public class L2D extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        long l = frame.getOpStack().popLong();
        double d = (double) l;
        frame.getOpStack().pushDouble(d);
    }
}

f2x 系列指令，f2i 是 float 转 int 类型：

public class F2I extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        float f = frame.getOpStack().popFloat();
        int i = (int) f;
        frame.getOpStack().pushInt(i);
    }
}

转换类型没什么特别的，都比较简单。

3.8 比较（comparisons）指令

比较指令，是比较变量的值，然后做出指定的操作。

共 19 条。

比较指令可分为2类：

比较后，结果推入操作数栈顶
比较后，根据结果跳转

比较指令主要用于实现 if-else、for、while 等语句。

3.8.1 结果推入操作数栈

比较返回结果的指令有5条：

lcmp
fcmpg
fcmpl
dcmpg
dcmpl

lcmp 指令用于比较 long 变量，并将结果（int 类型的-1/0/1）推入操作数栈：

public class LCmp extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        long v2 = frame.getOpStack().popLong();
        long v1 = frame.getOpStack().popLong();
        int v = CmpUtil.cmpLong(v1, v2);
        frame.getOpStack().pushInt(v);
    }
}

由于浮点数计算有可能产生 NaN（Not a Number）值，所以比较两个浮点数时，除了大于、等于、小于之外，还有第4种结果：无法比较。

fcmpg 和 fcmpl 指令都是用于比较 float 变量，意义都差不多。

fcmpg 和 fcmpl 指令的区别就在于对第4种结果（无法比较）的定义。

两个 · 变量中至少有一个是 NaN 时，用 fcmpg 指令比较的结果是1，而用 fcmpl 指令比较的结果是-1。

public class FCmpg extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        float v2 = frame.getOpStack().popFloat();
        float v1 = frame.getOpStack().popFloat();
        int v = CmpUtil.cmpFloat(v1, v2, true);
        frame.getOpStack().pushInt(v);
    }
}

public class FCmpl extends NoOperandsInstruction {

    @Override
    public void execute(Frame frame) {
        float v2 = frame.getOpStack().popFloat();
        float v1 = frame.getOpStack().popFloat();
        int v = CmpUtil.cmpFloat(v1, v2, false);
        frame.getOpStack().pushInt(v);
    }
}

上面几个命令有用到的工具类：

public final class CmpUtil {

    public static int cmpLong(long v1, long v2) {
        return Long.compare(v1, v2);
    }

    public static int cmpFloat(float v1, float v2, boolean gFlag) {
        if (v1 > v2) {
            return 1;
        } else if (v1 == v2) {
            return 0;
        } else if (v1 < v2) {
            return -1;
        } else if (gFlag) {
            return 1;
        } else {
            return -1;
        }
    }

}

dcmpg 和 dcmpl 指令用来比较 double 变量，它们的意义和 fcmpg、fcmpl 指令一样，这里不再给出。

3.8.2 比较跳转

比较跳转的指令有14条，可以分为2类：

单操作数指令：if<cond> 指令
双操作数指令：if_icmp<cond> 和 if_acmp<cond> 指令

单操作数 if<cond> 指令，是从栈顶弹出一个 int 整型变量和 0 进行比较：

ifeq: x == 0
ifne: x != 0
iflt: x < 0
ifle: x <= 0
ifgt: x > 0
ifge: x >= 0

实现很简单，ifeq 指令：

public class IfEq extends BranchInstruction {

    @Override
    public void execute(Frame frame) {
        int v1 = frame.getOpStack().popInt();
        if (v1 == 0) {
            branch(frame);
        }
    }

}

ifle 指令：

public class IfLe extends BranchInstruction {

    @Override
    public void execute(Frame frame) {
        int v1 = frame.getOpStack().popInt();
        if (v1 <= 0) {
            branch(frame);
        }
    }

}

其他指令类似，不在举例。

双操作数指令 if_icmp<cond>，用于从栈顶弹出2个 int 整型变量进行比较，然后跳转：

if_icmpeq: if x1 == x2
if_icmpne: if x1 != x2
if_icmplt: if x1 < x2
if_icmple: if x1 <= x2
if_icmpgt: if x1 > x2
if_icmpge: if x1 >= x2

if_icmpne 指令：

public class IfICmpNe extends BranchInstruction {

    @Override
    public void execute(Frame frame) {
        int v2 = frame.getOpStack().popInt();
        int v1 = frame.getOpStack().popInt();
        if (v1 != v2) {
            branch(frame);
        }
    }

}

if_icmpgt 指令：

public class IfICmpGt extends BranchInstruction {

    @Override
    public void execute(Frame frame) {
        int v2 = frame.getOpStack().popInt();
        int v1 = frame.getOpStack().popInt();
        if (v1 > v2) {
            branch(frame);
        }
    }

}

双操作数指令 if_acmp<cond>，也是从栈顶弹出2个变量，不过是引用变量，引用变量的比较只有2种情况：

1 2	if_acmpeq: if x1 == x2 if_acmpne: if x1 != x2

if_acmpeq 指令：

public class IfACmpEq extends BranchInstruction {

    @Override
    public void execute(Frame frame) {
        Object v2 = frame.getOpStack().popRef();
        Object v1 = frame.getOpStack().popRef();
        if (v1 == v2) {
            branch(frame);
        }
    }

}

if_acmpne 指令：

public class IfACmpNe extends BranchInstruction {

    @Override
    public void execute(Frame frame) {
        Object v2 = frame.getOpStack().popRef();
        Object v1 = frame.getOpStack().popRef();
        if (v1 != v2) {
            branch(frame);
        }
    }

}

3.9 控制（control）指令

控制指令，主要用于地址的直接跳转。

比如 return、goto、switch 等语句的实现。

包括的指令有：

goto
tableswitch
lookupswitch
ireturn
lreturn
freturn
dreturn
areturn

return 系列指令等后面再实现。

goto 指令进行无条件跳转：

public class Goto extends BranchInstruction {

    @Override
    public void execute(Frame frame) {
        branch(frame);
    }
}

tableswitch 指令和 lookupswitch 指令都是用于实现 switch 语句的：

tableswitch 指令：case 值可以编码成一个索引表
lookupswitch 指令：case 值不可以编码成一个索引表

什么时候 case 值可以编码成一个索引表？比如下面这2个例子：

switch (i) {
    case 0:  return  0;
    case 1:  return  1;
    case 2:  return  2;
    default: return -1;
}

这种 case 值是大于等于 0 的值，可以作为索引，就会编译成 tableswitch 指令。

switch (i) {
    case -100: return -1;
    case 0:    return  0;
    case 100:  return  1;
    default:   return -1;
}

这种存在负数的值，就不能用作索引，就会编译成 lookupswitch 指令。

tableswitch 指令的实现是这样的：

public class TableSwitch extends BranchInstruction {

    private int defaultOffset;
    private int low;
    private int high;
    private int[] jumpOffsets;

    @Override
    public void execute(Frame frame) {
        int index = frame.getOpStack().popInt();
        if (index >= low && index <= high) {
            offset = jumpOffsets[index - low];
        } else {
            offset = defaultOffset;
        }
        branch(frame);
    }

    @Override
    public void fetchOperands(ByteCodeReader reader) {
        reader.skipPadding();
        defaultOffset = reader.readInt();
        low = reader.readInt();
        high = reader.readInt();
        int jumpOffsetsCount = high - low + 1;
        jumpOffsets = reader.readInts(jumpOffsetsCount);
    }

}

其中，defaultOffset 就是 switch 语句中的 default 语句，而 low 和 high 则是对应的 case 语句的范围。

jumpOffsets 是一个索引表，里面存放 high - low + 1 个 int 值，对应各种 case 情况下，执行跳转所需的字节码偏移量。

tableswitch 指令的操作数是从栈中弹出的，作为偏移量地址，如果在 low 和 high 范围内，则说明是 case 语句，否则是 default 语句。

tableswitch 指令操作码的后面有 0~3 字节的 padding，这个是为了对齐地址用的，保证 defaultOffset 在字节码中的地址是4的倍数。

下面是 lookupswitch 指令的实现：

public class LookupSwitch extends BranchInstruction {

    private int defaultOffset;
    private int npairs;
    private int[] matchOffsets;

    @Override
    public void execute(Frame frame) {
        int key = frame.getOpStack().popInt();
        offset = defaultOffset;
        for (int i = 0; i < npairs * 2; i += 2) {
            if (matchOffsets[i] == key) {
                offset = matchOffsets[i + 1];
                break;
            }
        }
        branch(frame);
    }

    @Override
    public void fetchOperands(ByteCodeReader reader) {
        reader.skipPadding();
        defaultOffset = reader.readInt();
        npairs = reader.readInt();
        matchOffsets = reader.readInts(npairs * 2);
    }
}

其中，defaultOffset 也是默认的地址偏移量，npairs 表示有多少个 case 语句。

每个 case 语句都包括2部分内容：一个就是 case 的 key 值，比如前面的 100，一个是 case 代码的地址偏移量，表示跳转的偏移地址。

lookupswitch 指令的操作数也是从栈中弹出，作为 case 的 key 去比较，找到则跳转到 case 对应的偏移地址，否则跳到 default 语句。

lookupswitch 指令也有地址对齐的操作，和 tableswitch 指令作用一样。

3.10 引用（references）指令

引用指令，是和字段访问、方法调用相关的指令。

这里暂不实现。

3.11 扩展（extended）指令

扩展指令，是给一些操作数比较小的指令进行扩展。

扩展指令有3类：

1
2
3

wide
ifnull 和 ifnonnull
goto_w

3.11.1 wide

wide 指令用于扩展操作索引的范围。

比如加载指令、存储指令等需要访问局部变量表的指令，索引用的都是 uint8 字节。

对于大部分方法来说，uint8 的大小已经足够满足了，但是不排除有些方法的局部变量表过大，所以才使用 wide 执行来扩展它们。

扩展的指令包括：

0x15: iload
0x16: lload
0x17: fload
0x18: dload
0x19: aload
0x36: istore
0x37: lstore
0x38: fstore
0x39: dstore
0x3a: astore
0x84: iinc
0xa9: ret

wide 指令只是增加了索引宽度，并不改变子指令操作。

比如，原来的加载指令 iload 操作数是一个 uint8 字节的索引，在 wide 指令中则是 uint16 的双字节索引：

public class WILoad extends Index16Instruction {

    @Override
    public void execute(Frame frame) {
        frame.getOpStack().pushInt(frame.getLocalVars().getInt(index));
    }

}

注意这里换成了 Index16Instruction 基类，使用的是2字节的索引。

同理，dstore 指令也换成了双操作数索引：

public class WDStore extends Index16Instruction {

    @Override
    public void execute(Frame frame) {
        frame.getLocalVars().setDouble(index, frame.getOpStack().popDouble());
    }

}

自增指令 iinc 也是一样：

public class WIInc implements Instruction {

    /**
     * 局部变量索引
     */
    private Uint16 index;
    /**
     * 常量
     */
    private int value;

    @Override
    public void execute(Frame frame) {
        int val = frame.getLocalVars().getInt(index.value());
        val += value;
        frame.getLocalVars().setInt(index.value(), val);
    }

    @Override
    public void fetchOperands(ByteCodeReader reader) {
        index = reader.readUint16();
        // 注意这里是16位有符号整数
        value = reader.readInt16();
    }
}

其他指令类似，不再列出。

3.11.2 ifnull 和 ifnonnul

和前面的比较指令差不多，ifnull 和 ifnonnull 就是用于比较 null 值并跳转的：

public class IfNull extends BranchInstruction {

    @Override
    public void execute(Frame frame) {
        Object v1 = frame.getOpStack().popRef();
        if (v1 == null) {
            branch(frame);
        }
    }

}

public class IfNonNull extends BranchInstruction {

    @Override
    public void execute(Frame frame) {
        Object v1 = frame.getOpStack().popRef();
        if (v1 != null) {
            branch(frame);
        }
    }

}

3.11.3 goto_w

前面的 goto 指令操作数是 int16 位有符号整数，goto_w 指令则是扩展成 int32 位有符号整数：

public class WGoto extends BranchInstruction {

    @Override
    public void execute(Frame frame) {
        branch(frame);
    }

    @Override
    public void fetchOperands(ByteCodeReader reader) {
        // 注意是32位有符号整数
        offset = reader.readInt();
    }
}

3.12 保留（reserved）指令

保留指令是留给虚拟机用的，这里暂时不实现。

四、解释器

完成所有指令的解析之后，就可以实现一个简单的解释器，执行解析好的指令。

因为方法的调用，最后都会执行 return 语句，由于暂时未实现 return 语句，所以解释器目前只能执行一个方法。

public class Interpreter {

    public void interpret(MethodInfo methodInfo) {
        // 拿到方法的代码属性
        CodeAttributeInfo codeAttr = methodInfo.getCodeAttributeInfo();
        Uint16 maxLocals = codeAttr.getMaxLocals();
        Uint16 maxStack = codeAttr.getMaxStack();
        byte[] codes = codeAttr.getCodes();

        // 创建一个栈帧测试
        Thread thread = new Thread();
        Frame frame = thread.newFrame(maxLocals.value(), maxStack.value());
        thread.pushFrame(frame);

        // 解释执行代码
        loop(thread, codes);
    }

    private void loop(Thread thread, byte[] codes) {
        Frame frame = thread.popFrame();
        ByteCodeReader reader = new ByteCodeReader(codes);
        try {
            while (true) {
                // 程序计数器地址
                int pc = frame.getNextPc();
                thread.setPc(pc);
                reader.setPosition(pc);

                // 编译识别指令
                int opcode = reader.readUint8().value();
                Instruction instruction = InstructionFactory.newInstance(opcode);
                if (instruction == null) {
                    break;
                }

                // 获取指定操作数
                instruction.fetchOperands(reader);
                frame.setNextPc(reader.getPosition());

                // 执行指令
                instruction.execute(frame);
            }
        } catch (Exception e) {
           e.printStackTrace();
        } finally {
            System.out.println("Frame: " + frame);
        }

    }

}

解释器的逻辑很简单：

从指定方法中拿出 code 代码属性
根据指令集解析 code 代码的指令
执行解析好的指令

由于没有实现 return 指令，所以不能很好的看出结果，而且执行到最后肯定会报错。

这里通过捕获错误，并打印栈帧来查看结果。

getCodeAttributeInfo()、getMaxLocals() 这些都是 MethodInfo 新增的 get 方法，这里不多说。

InstructionFactory.newInstance(opcode) 是根据字节码生成对应的指令，实现差不多这样：

public class InstructionFactory {

    public static Instruction newInstance(int opcode) {
        switch (opcode) {
            case 0x00:
                return new Nop();
            case 0x01:
                return new AConstNull();
            case 0x02:
                return new IConstM1();
            case 0x03:
                .
                .
                .
               ...
        }
    }
}

太长了，这里就不全列举出来了。

还需要改造一下 Jvm 类的代码，让他可以跑指定的方法，这里执行的是 main 方法：

public class Jvm {

    public static void main(String[] args) throws IOException {
        Cmd cmd = new Cmd();
        cmd.printHelp();

        String[] testArgs = new String[]{ "com.wjd.cmd.Cmd", "-classpath",
                "D:\\Projects\\IdeaProjects\\self-jvm\\target\\test-classes;D:\\Projects\\IdeaProjects\\self-jvm\\target\\classes" };
        cmd.parse(testArgs);

        String userClassName = "com\\wjd\\instructions\\InstructionsTest";
        Classpath classpath = new Classpath(cmd.getJreOption(), cmd.getCpOption());

        ClassFile classFile = loadClass(userClassName, classpath);
        MethodInfo mainMethod = getMainMethod(classFile);
        if (mainMethod != null) {
            // 解释器执行，解释执行 main 方法
            new Interpreter().interpret(mainMethod);
        } else {
            System.out.println("Not found main method!");
        }
    }

    /**
     * 加载类
     */
    public static ClassFile loadClass(String className, Classpath classpath) throws IOException {
        byte[] userClassBytes = classpath.readClass(className);
        ClassReader reader = new ClassReader(userClassBytes);
        return ClassFile.parse(reader);
    }

    /**
     * 获取类文件中的main方法
     */
    public static MethodInfo getMainMethod(ClassFile classFile) {
        for (MethodInfo m : classFile.getMethods())
            if ("main".equals(m.name()) && "([Ljava/lang/String;)V".equals(m.descriptor())) {
                return m;
            }
        return null;
    }

}

改的内容就是找到测试类，拿出它的 main 方法，然后交给解释器 Interpreter 去解释执行。

五、单元测试

测试类，就是要测试执行它的 main 方法：

public class InstructionsTest {

    public static void main(String[] args) {
        int sum = 0;
        for (int i = 0; i <= 100; i++) {
            sum += i;
        }
        System.out.println(sum);
    }

}

测试应该会出现错误，在异常捕获里面，栈帧输出结果里面应该包含结果值 5050。

总结

指令结构

字节码中存放编码后的 Java 虚拟机指令，每条指令都以一个单字节的操作码（opcode）开头
由于只使用一字节表示操作码，Java 虚拟机最多只能支持 256 条指令
Java 虚拟机使用的是变长指令，操作码后面可以跟零字节或多字节的操作数（operand）
比如 0xB20002 这条指令，B2 表示该指令的操作码，0002 就表示操作数

指令助记符

为了便于记忆，Java 虚拟机规范给每个操作码都指定了一个助记符（mnemonic）
比如，操作码是 0x00 的助记符是 nop（no operation）
操作数栈和局部变量表只存放数据的值（Slot），并不记录数据类型
指令必须知道自己在操作什么类型的数据，即指令绑定了数据类型
例如，iadd 指令就是对 int 值进行加法操作

指令类型

Java 虚拟机规范把已经定义的205条指令按用途分成了11类，分别是：

常量（constants）指令
加载（loads）指令
存储（stores）指令
操作数栈（stack）指令
数学（math）指令
转换（conversions）指令
比较（comparisons）指令
控制（control）指令
引用（references）指令
扩展（extended）指令
保留（reserved）指令

保留指令：

1条是留给调试器的，用于实现断点，操作码是 202(0xCA)，助记符是 breakpoint
另外2条留给 Java 虚拟机实现内部使用，操作码分别是 254(0xFE) 和 266(0xFF)，助记符是 impdep1 和 impdep2

05_指令集和解释器

http://example.com/lang/java/jvm/selfjvm/05_instruction/

作者

jiaduo

发布于

2022-02-21

更新于

2023-04-03

许可协议

java, JVM

05_指令集和解释器

指令集和解释器

一、字节码指令

1.1 指令结构

1.2 指令助记符

1.3 指令类型

二、指令运行

2.1 指令循环

2.2 指令接口

2.2 指令解码

三、指令集

3.1 抽象指令基类

3.2 常量（constants）指令

3.2.1 隐含在操作码的常量

3.2.2 操作数常量

3.2.3 常量池常量

3.3 加载（loads）指令

3.4 存储（stores）指令

3.5 操作数栈（stack）指令

3.5.1 弹出指令

3.5.2 复制指令

3.5.3 交换指令

3.6 数学（math）指令

3.7 转换（conversions）指令

3.8 比较（comparisons）指令

3.8.1 结果推入操作数栈

3.8.2 比较跳转

3.9 控制（control）指令

3.10 引用（references）指令

3.11 扩展（extended）指令

3.11.1 wide

3.11.2 ifnull 和 ifnonnul

3.11.3 goto_w

3.12 保留（reserved）指令

四、解释器

五、单元测试

总结

指令结构

指令助记符

指令类型

作者

发布于

更新于

许可协议

目录