Assignment 4: Simplesim (20 points)

This assignment is the first in a sequence. You must complete each part of the sequence in order to proceed to the next!

We start by "peeling open" a computer, look at its internal structure, and introducing machine language (assembler-level) programming. Your assignment is to write a program that simulates a computer, one that is capable of executing machine language programs.

Description of the Simplesim Computer

In this assignment you will write a program to simulate a fictional computer that we will call the Simplesim. As its name implies it is a simple machine. All information in the Simplesim is handled in terms of words. A word is a signed four-digit decimal (base 10) number such as +3364, -1293, +0007, -0001, 0000, etc. The Simplesim is equipped with memory and five registers.

The Simplesim Machine Language (SML)

Each instruction written in the Simplesim Machine Language (SML) occupies one word of the Simplesim's memory (and hence instructions are signed four-digit decimal numbers). The two leftmost digits of each SML instruction are the operation code (opcode), which specifies the operation to be performed. The two rightmost digits of an SML instruction are the operand, which is the memory location containing the word to which the operation applies. The complete set of SML instructions is described in the table that follows.

Operation Code Meaning
Input / Output Operations:
READ 01 Read a word into a specific memory location.
WRITE 02 Print a word from a specific memory location.
Store / Load Operations:
STORE 11 Store the word in the accumulator into a specific memory location.
LOAD 12 Load a word from a specific memory location into the accumulator.
Arithmetic Operations:
ADD 21 Add a word in a specific memory location to the word in the accumulator (leave result in accumulator).
SUBTRACT 22 Subtract a word in a specific memory location from the word in the accumulator (leave result in accumulator).
MULTIPLY 23 Multiply a word in a specific memory location by the word in the accumulator (leave result in accumulator).
DIVIDE 24 Divide a word in a specific memory location into the word in the accumulator (leave result in accumulator).
Transfer of Control Operations:
BRANCH 31 Branch to a specific memory location.
BRANCHZERO 32 Branch to a specific memory location if the accumulator is zero.
BRANCHNEG 33 Branch to a specific memory location if the accumulator is negative.
HALT 34 Halt, i.e., the program has completed its task.

We illustrate how the Simplesim executes SML programs (using the instructions from the table above) with the use of two example SML programs. Consider the following SML program which reads two numbers and computes and prints their sum.

Memory
Location
Word Instruction
00 +0107 (Read A)
01 +0108 (Read B)
02 +1207 (Load A)
03 +2108 (Add B)
04 +1109 (Store C)
05 +0209 (Write C)
06 +3400 (Halt)
07 +0000 (Variable A)
08 +0000 (Variable B)
09 +0000 (Result C)

Execution always begins at memory location 00. The word at memory location 00 (+0107) is read and interpreted as an instruction. The leftmost two digits of the word (01) represent the instruction and the rightmost two digits (07) represent the instruction's operand. The first instruction is a READ operation. This reads a single word from the input file (explained in Section 4) and stores it in the memory location defined by the operand, in this case memory location 07. (READ and WRITE instructions always operate on memory locations.) This completes the execution of the first instruction. Processing continues by executing the next instruction found at memory location 01.

The next instruction (+0108) reads a second word from the input file and stores it in memory location 08. The next instruction (+1207) is a LOAD operation with operand 07. It takes the word found at memory location 07 (the operand) and places it into the accumulator (recall that the accumulator is one of the five registers described in Section 1). All LOAD and STORE operations move data in and out of the accumulator.

The next instruction (+2108) is an ADD instruction with operand 08. All SML arithmetic instructions are performed using the word in the accumulator and the word identified by the operand and the result is always left in the accumulator. This instruction takes the word stored in memory location 08 (the operand), adds it to the value in the accumulator, and leaves the sum in the accumulator.

The next instruction (+1109) is a STORE instruction which, like all STORE instructions, takes the word in the accumulator (the sum of the two input values) and stores it in the memory location identified by the instruction's operand, in this case memory location 09. Then +0209, a WRITE instruction, prints (output is explained in Section 5) the word found in memory location 09, which - again - is the sum of the two input values. Finally instruction +3400, the HALT instruction, is executed which simply terminates the SML program (operand 00 is ignored for this instruction).

Note that a single word in memory can be used to store a single instruction that is to be executed or a single variable (that should, generally, never be interpreted as an instruction.) None of the memory locations after the HALT instruction (memory locations 07-09) were executed; however, they were important in the computation. Those words were used to store the program's variables and temporary results.

All SML programs will "partition" the Simplesim's memory in this way. The first words of memory (always starting at memory location 00) are the "instructions" of the program and following that, after the HALT instruction, is the "data" part of the program. The intention, of course, is that only the "instructions" of the program are to be executed, i.e., each word interpreted as an SML instruction.

Now consider this second SML program that reads two numbers and prints the larger of the two.

Memory
Location
Word Instruction
00 +0109 (Read A)
01 +0110 (Read B)
02 +1209 (Load A)
03 +2210 (Subtract B)
04 +3307 (Branch negative to 07)
05 +0209 (Write A)
06 +3400 (Halt)
07 +0210 (Write B)
08 +3400 (Halt)
09 +0000 (Variable A)
10 +0000 (Variable B)

The first two instructions (+0109 and +0110) read two values and store them in memory locations 09 and 10, respectively. +1209 places the word at memory location 09 (the first input value) into the accumulator. +2210, a SUBTRACT instruction, takes the word at memory location 10 (the second input value), subtracts it from the accumulator, and leaves the result in the accumulator.

+3307 (BRANCHNEG) is a conditional branch instruction, much like an "if" statement in C++. All conditional branch instructions are based on the accumulator. The BRANCH instruction, which acts like a "goto", is the only branch instruction that ignores the accumulator; it is simply an unconditional branch.

If the value in the accumulator is negative, which in this case means the second input value was the largest, then the next instruction that gets executed is the one at memory location 07 (the operand). If the value in the accumulator is 0 or greater, meaning the first input value was greater than or equal to the second, then execution continues with the next statement, i.e., no branching. If the branch was taken, then the value at memory location 10 (the second input value) is printed and the program terminates. Otherwise the value at memory location 09 (the first input value) is printed and the program terminates.

Note how the SML program is written. It "partitions" the Simplesim's memory into two distinct parts; the "program" (locations 00-08) and the "data" (locations 09-10). This SML program, unlike the first, has two HALT instructions. This is okay; only one of them will be executed. The point is that HALT instructions are used to prevent the execution of the program from wandering into the "data" portion of the program.

Input

Your program will take as input an SML program followed by any input for that SML program.

The input file will start with the SML program, one instruction per line. Following the last line of the SML program will be the number -99999, which is not part of the SML program. If the SML program expects any input (i.e., if it has any READ instructions) then input for the SML program, one input value per line, immediately follows the -99999 line. For example, below is the input file for the first program from the previous section. It adds -5 and 15.

0107
0108
1207
2108
1109
0209
3400
0000
0000
0000
-99999
-5
15

Note that each line of the input file, other than -99999 which is used to denote the end of program and not intended to be placed into the Simplesim's memory, fits into a single word. Note also that not all SML programs require input (those that do not have READ instructions). In that case there would be no data after the -99999 line.

All input files to your program have -99999 after the last SML instruction. For those SML programs that do not require input, (those that do not have READ instructions), -99999 is simply the last line of the input file.

Output

Each time a READ instruction is executed your program must print the value that was read. For example, the two values read in the program from the previous section are -5 and 15. As each value is read, your program must print output that looks exactly like this.

READ: -0005
READ: +0015

For each WRITE instruction your program must print the value of the word in that memory location. For example, from the program in the previous section, the sum 10 is printed exactly like this.

+0010

When the HALT statement is executed, your program must print the following line:

*** Simplesim execution terminated ***

At the end of any execution your program must dump the entire contents of the Simplesim. This means dumping the contents of all five registers and all 100 words of the Simplesim's memory.

Assuming that the name of your program is simplesim and the name of the SML program file above is sum.sml, then the output of your program must look exactly like this:

z123456@turing:~/csci501/Assign4$ ./simplesim < sum.sml
READ: -0005
READ: +0015
+0010
*** Simplesim execution terminated ***
REGISTERS:
accumulator:            +0010
instruction_counter:    06
instruction_register:   +3400
operation_code:         34
operand:                00

MEMORY:
       0     1     2     3     4     5     6     7     8     9
00 +0107 +0108 +1207 +2108 +1109 +0209 +3400 -0005 +0015 +0010
10 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
20 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
30 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
40 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
50 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
60 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
70 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
80 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
90 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
z123456@turing:~/csci501/Assign4$

One of the first things that your program will do is read the SML program into the Simplesim's memory. This is called loading the program. There are a couple of things that could go wrong when loading the program; the program may be too large for the Simplesim's 100-word memory or a line of the input file may not fit into a word (i.e., it may be greater than 9999 or less than -9999). In these situations your program must print an error message, dump the contents of the machine, and terminate. It must not start to run the SML program.

If there was a successful SML program load, your program must start to execute the SML program. SML programs, like any other programs, may perform an illegal operation and terminate abnormally (abend). There are a number of conditions that may cause an SML program to abend, in which case processing stops immediately. An example of this is an attempt to divide by 0. In that case, the Simplesim must print an appropriate abend message, stop execution, and dump the contents of the machine. Every execution of your program (normal termination of the SML program or SML program abend) ends with a dump of the Simplesim.

A summary of the possible abend conditions (program load and execution errors) with their error messages appear in the following table. Note that all error messages must appear exactly as they appear in the table.

Condition Error Message Description
Program Load Errors:
Program too big *** ABEND: pgm load: pgm too large *** The program is too big (more than 100 words) to fit into memory.
Invalid word *** ABEND: pgm load: invalid word *** During program load, one of the words in the input file was less than -9999 or greater than 9999.
Execution Errors:
Invalid opcode *** ABEND: invalid opcode *** An attempt was made to execute an unrecognizable instruction, i.e., the leftmost two digits of the word was not a valid instruction.
Adressability *** ABEND: addressability error *** An attempt was made to fetch an instruction from an invalid memory location.
Division by 0 *** ABEND: attempted division by 0 *** Attempt to divide by 0.
Underflow *** ABEND: underflow *** The result of an arithmetic operation is less than -9999, and therefore would not fit into the accumulator.
Overflow *** ABEND: overflow *** The result of an arithmetic operation is greater than 9999, and therefore would not fit into the accumulator.
Illegal input *** ABEND: illegal input *** During a READ instruction an attempt was made to read a value that was either less than -9999 or greater than 9999.

Files You Must Write

You will need to write three files for this assignment:

Simulating the Simplesim

Among other things, your simplesim class will provide the definition of an enum that you will use to define the instruction operation codes (op codes.) It may be a public member of the class defined like this:

enum insn { READ=1, WRITE=2, STORE=11, ... };

Once so defined, you can use insn as a new data type ans shown below for operation_code.

The simplesim class will need private data members that represent the memory and registers of the machine. You must use an array of integers (length 100) to simulate the memory and five separate integer variables to simulate each of the five Simplesim registers. You might find it easiest to name the memory and register variables as follows:

int memory[100];
int accumulator;
int instruction_counter;
int instruction_register;
int operand;
insn operation_code;

Organize your program as a sequence of four steps; initialization, load, execute, and dump. Each step is described in detail below.

  1. Initialization

    The default constructor for the simplesim class simulates "turning the Simpletron on". It is a simple, but necessary, step in which all five registers are initialized with zero and all 100 words of memory are initialized with the value 4444. The value 4444 was chosen, in part, because the leftmost two digits (44) are not a valid instruction (...and therefore if an errant program branches to an uninitialized memory location, the program will abend.)

  2. Load SML Program

    In the load_program member function, you load the SML program into the Simplesim's memory. This requires you to read the program operation codes, one at a time, from stdin and stop when you encounter -99999. (All input starts with the SML program, with one instruction (word) per line, and marks the end of the SML program with -99999.) Load each instruction of the program into a word of memory, starting at memory location 00 and proceeding continuously in memory (not skipping any memory locations) until the entire program has been loaded.

    As you read each line from the input file, before placing it into the Simplesim's memory, you must verify that it is a "valid" word, i.e., one that will fit (a value between -9999 and 9999, inclusive) in a Simplesim's memory cell. If you encounter an invalid word (other than -99999 which denotes the end of the program) during the program load, your program must stop loading immediately, print the appropriate error message from the table above, and return false.

    Also, if in the course of loading the program you run out of Simplesim's memory - i.e., an SML program that is more than 100 words - your program must stop loading immediately, print the appropriate error message from the table above, and return false.

    If the program is successfully loaded with no errors, return true.

  3. Execute SML Program

    Assuming a successful SML program load, the execute_program() member function will be called to execute the SML program. The code for this member function is essentially a loop that executes one instruction at a time. Executing an instruction is a two step process; instruction fetch and instruction execute. The body of the member function will have a structure similar to this:

    bool done = false;
    while (!done)
    {
        // instruction fetch
        . . .
        
        // instruction execute
        . . .
        
        switch (operation_code)
        {
            case READ:
                . . .
    
            case WRITE:
                . . .
            
            // More cases
            
            default:
                . . .
        }
    
        if (operation_code is not branching AND !done)
            instruction_counter ++;
    }
    

    Instruction fetch starts by testing the value in the instruction_counter register. If it contains a valid memory location (00-99), then load the instruction_register with that word from memory and split the instruction_register by placing its leftmost two digits into the operation_code register and its rightmost two digits into the operand register:

    operation_code = static_cast<insn>(instruction_register / 100);
    operand = instruction_register % 100;
    

    Note that the case is necessary when assigning a numeric value to the operation_code variable because it is an enum. The compiler will not normally allow a program to assign arbitrairy values to an enum.

    If the instruction_counter does not contain a valid memory location, then print the appropriate error message from the table above and return without setting/changing the instruction_register, operation_code, and operand registers.

    Assuming that you have successfully fetched an instruction, the next part of the loop executes the instruction. Recall that the instruction is now sitting in the operation_code register and its operand is in the operand register.

    After executing certain instructions, you must increment the instruction_counter register to point to the next instruction in memory for the next fetch cycle. You must increment the instruction_counter after executing an instruction only under the following two conditions: if the instruction that was just executed was not one of the branching instructions and if the result of executing the last instruction did not terminate the SML program, i.e., just executed HALT or an abend occurred.

    Executing an instruction must be implemented using a switch statement, switching on the value in the operation_code register. The following is a description of how each case must be processed. Some of these will be very simple (single lines in C++) and others will require more code. Although they may be listed as a single item in the list below, each SML instruction must be a single case statement within your switch.

    If the value in the operation_code register is not one of those instructions (i.e., the default case of your switch statement), then your program must print the appropriate error message and return.

  4. Dump Simplesim

    At the end of every execution of your program (normal SML termination, SML program load error, or SML execution error) you must call dump() to dump the contents of the Simplesim. This means printing the contents of all five registers and printing the contents of all 100 words of memory. It must be printed exactly in the following format:

    REGISTERS:
    accumulator:            +0000
    instruction_counter:    00
    instruction_register:   +0000
    operation_code:         00
    operand:                00
    
    MEMORY:
           0     1     2     3     4     5     6     7     8     9
    00 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
    10 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
    20 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
    30 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
    40 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
    50 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
    60 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
    70 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
    80 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
    90 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444
    

    It is important that you use spaces precisely as described when writing your program (explained below).

    Formatting the output correctly will require extensive use of the manipulators in <iostream> and <iomanip>. Make sure that you are familiar with the following manipulators described in the notes on Output Formatting: setw, right, internal, showpos / noshowpos, and setfill.

Files I Give You

In addition to the example test program shown in the input section above, here are two example programs and their correct output:

max.sml

A program that will read two numbers and print the maximum of the two:

0109
0110
1209
2210
3307
0209
3400
0210
3400
0000
0000
-99999
-11
-2

Output:

READ: -0011
READ: -0002
-0002
*** Simplesim execution terminated ***
REGISTERS:
accumulator:            -0009
instruction_counter:    08
instruction_register:   +3400
operation_code:         34
operand:                00

MEMORY:
       0     1     2     3     4     5     6     7     8     9
00 +0109 +0110 +1209 +2210 +3307 +0209 +3400 +0210 +3400 -0011 
10 -0002 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
20 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
30 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
40 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
50 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
60 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
70 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
80 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
90 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 

div0.sml

A program that divide the accumulator by zero:

2401
0000
-99999

Output:

*** ABEND: attempted division by 0 ***
REGISTERS:
accumulator:            +0000
instruction_counter:    00
instruction_register:   +2401
operation_code:         24
operand:                01

MEMORY:
       0     1     2     3     4     5     6     7     8     9
00 +2401 +0000 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
10 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
20 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
30 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
40 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
50 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
60 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
70 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
80 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 
90 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 +4444 

You must test your simulator by writing test programs as needed to cause every instruction to be executed in order to verify that your program is working correctly with and without any possible abend situations that could arise.

Rest assured, we will run many such programs when grading your simulator.

Hints

This is a large assignment so you might want to break it down into parts. Write part of the program, test that part thoroughly convincing yourself that it works, and then move on adding to what you've done. You might want to try the following sequence.

  1. Write the initialization and dump sections first. These are the easiest parts of the assignment.

  2. Declaring and initializing all the variables needed to simulate the Simplesim (memory and all five registers) should not take long at all. Then you should write the dump section (skipping the program load and execution parts). Since all executions of your program will end in a dump and dumping the contents of the Simplesim will be a valuable debugging tool as you write this assignment, it's a good idea to get this done correctly right away. Once you are done with these two tasks (initialization and dump) run your program (without redirecting any stdin). The output must be identical to the dump presented above, that is, all five registers initialized to 0 and all 100 words of memory initialized to 4444.

  3. Write the program load section.

    Write this code and insert it immediately after your initialization code. Start loading the program at memory location 00 and be sure that you stop loading program when you encounter -99999. Also, be careful to check that each word of the program that you are loading is between -9999 and 9999, inclusively, and to print the appropriate error message, stop loading, and proceed to the dump if you encounter a word that is not in that range. Also, be careful to check that the program you are loading can fit into memory (less than 100 words), and if it cannot, stop immediately, print the appropriate error message, and proceed to the dump.

    Remember that the program load phase is only interested in loading valid words. It does not check if the program that is being loaded makes any sense, will work, or even if the first word loaded is a valid SML instruction. It only cares that each word is a valid word (between -9999 and 9999) and that there is enough Simplesim memory to hold the entire program.

    After writing this section, you should check your work with some test programs that will generate program load errors as a result of having too many instructions or invalid positive and negative values and verify that they produce the appropriate error messages before you proceed to the next part. For any valid program (like those provided above) all the registers should be zero and you should see that the program was loaded into memory (without the -99999 and any input that might follow), and that the rest of memory is still 4444.

  4. Write the program execution section.

    Insert this code immediately after the program load section. It must only get executed if there was a successful program load. This is the largest portion of the whole assignment; however, it is not conceptually difficult. This portion of the code is essentially a while loop with two parts inside: instruction fetch followed by instruction execution. The instruction execution is essentially a large switch statement in which each case implements a single SML instruction and the default case is an *** ABEND: invalid opcode ***. Even this part should be done in parts itself.

A common "trap" that many students fall into when writing assignments like this is to worry about what the SML program is doing, e.g., is the SML program going to execute an invalid instruction, is it going to enter an infinite loop, is it going to branch to the "data" portion of the program and start executing there, etc. My best advice to you is do not care about what the SML program is doing.

Your job in executing the SML program job is to simulate each instruction, one at a time. Taking each instruction one at a time is really simple. Do not care what instructions were executed before, what instructions are to come afterwards. Do not care whether or not the SML program has accidentally branched into data, or has started an an infinite loop, or if it accidentally wrote data into the program part of the program that you will eventually execute (self-modifying code), etc. Do not care about any of this stuff. REAL computers don't care about YOUR programs, so why should your computer care about SML programs? You shouldn't.

When executing the SML program, take each instruction one at a time, execute it to the best of your ability, and move on. If you encounter something that causes your Simplesim to abnormally terminate, then print the appropriate error message and dump. If you encounter a 1400 (HALT), then print the nomal termination message and dump. That's it.

Last modified: 2020-10-07 12:39:59 CDT