193 lines
7.2 KiB
Python
193 lines
7.2 KiB
Python
"""Assembler for the hack computer
|
|
|
|
Usage:
|
|
|
|
python assembler.py file
|
|
|
|
Loads an assembly file and translate it into machine language for the hack
|
|
computer as specified in project 6 of the nand2tetris course.
|
|
|
|
# Assembler implementation details
|
|
This assembler works in 3 steps:
|
|
1. Load and clean the assembly file
|
|
2. Construct a symbol table referencing the user-defined labels and variables
|
|
3. Translate the asm file to binary code
|
|
The file follows this pattern:
|
|
File parsing (line 147)
|
|
Symbol table (line 207)
|
|
Assembling (line 322)
|
|
??
|
|
|
|
# Assembly language specifications:
|
|
|
|
## Note on registers
|
|
The Hack computer has three 16-bit registers: the D-, A- and M-registers.
|
|
The D-register is used to store "data" that can be used as input for the ALU.
|
|
The A-register can be used in the same way but it also has a second role:
|
|
The RAM use it as address input. So any read/write instruction to the RAM is
|
|
done on the register which has the value of the A-register as address.
|
|
The M-register represents this register in the RAM that is 'pointed' onto by the
|
|
A-register. Reading/writing to it consists actually in reading/writing to
|
|
the RAM.
|
|
|
|
## Assembly instructions
|
|
There are three types of assembly instructions: A-instructions, C-instructions
|
|
and labels. Indents and blanks are ignored. Comments can only be in-line, start
|
|
with "//" and are ignored.
|
|
|
|
## A-instructions
|
|
- `"@" integer` where integer is a number in the range 0->32768. Sets the A
|
|
register to contain the specified integer. Ex: @42
|
|
- `"@" label` where label is a user-defined label. Sets the A register to
|
|
contain the code address corresponding to the label.
|
|
Labels are upper-cased by convention, with "_" as word separator. Ex: @MAIN
|
|
- `"@" variable` where variable is a user-defined variable. Sets the A register
|
|
to contain the RAM adress corresponding to the variable. If a variable is
|
|
encountered for the first time, it is automatically assigned an address.
|
|
The address assignment starts at RAM address 16 and increments.
|
|
Variables are lowercased by convention, with "_" as word separator. Ex: @i
|
|
|
|
## C-instructions
|
|
`(Dest-code "=")? op-code (";" jump-code)?`
|
|
- op-code:
|
|
Only the op-code is mandatory. It represents an instruction to be performed
|
|
by the ALU. Available codes and their associated outputs are:
|
|
- 0 -> the constant 0
|
|
- 1 -> the constant 1
|
|
- -1 -> the constant -1
|
|
- D -> the value contained in the D-register
|
|
- A -> the value contained in the A-register
|
|
- M -> the value contained in the M-Register
|
|
- !D -> bit-wise negation of the D-register
|
|
- !A -> bit-wise negation of the A-register
|
|
- !M -> bit-wise negation of the M-register
|
|
- -D -> numerical negation of the D-register using 2's complement
|
|
- -A -> numerical negation of the A-register using 2's complement
|
|
- -M -> numerical negation of the M-register using 2's complement
|
|
- D+1 -> 1 + value of the D-register
|
|
- A+1 -> 1 + value of the A-register
|
|
- M+1 -> 1 + value of the M-register
|
|
- D-1 -> -1 + value of the D-register
|
|
- A-1 -> -1 + value of the A-register
|
|
- M-1 -> -1 + value of the M-register
|
|
- D+A -> value of the D-register + value of the A-register
|
|
- D+M -> value of the D-register + value of the M-register
|
|
- D-A -> value of the D-register - value of the A-register
|
|
- D-M -> value of the D-register - value of the M-register
|
|
- A-D -> value of the A-register - value of the D-register
|
|
- M-D -> value of the M-register - value of the D-register
|
|
- D&A -> bit-wise AND of the values of the D and A registers
|
|
- D&M -> bit-wise AND of the values of the D and M registers
|
|
- D|A -> bit-wise OR of the values of the D and A registers
|
|
- D|M -> bit-wise OR of the values of the D and M registers
|
|
- dest-code:
|
|
If specified, should be followed with a "=" character. Available codes are:
|
|
- D -> write the ALU instruction's output to the D-register
|
|
- A -> write the ALU instruction's output to the A-register
|
|
- M -> write the ALU instruction's output to the M-register
|
|
- AD -> write the ALU instruction's output to the A- and D-registers
|
|
- AM -> write the ALU instruction's output to the A- and M-registers
|
|
- MD -> write the DLU instruction's output to the D- and M-registers
|
|
- ADM -> write the DLU instruction's output to the A-, D- and M-registers
|
|
- jump-code:
|
|
If specified, should be preceded by a ";" character. The computer is fed
|
|
with a programm containing one binary instruction per line. Each of those
|
|
instructions should be seen as having a number, starting at 0 and increasing
|
|
by one. The jump-code lets the computer jump to the instruction of which the
|
|
address is contained in the A-register if the result of the current
|
|
operation satisfies a certain condition. Available codes and corresponding
|
|
conditions are:
|
|
- JEQ -> jump if the output is equal to 0
|
|
- JLT -> jump if the output is lower than 0
|
|
- JLE -> jump if the output is lower than 0 or equal to 0
|
|
- JGT -> jump if the output is greater than 0
|
|
- JGE -> jump if the output is greater than 0 or equal to 0
|
|
- JNE -> jump if the output is not 0
|
|
- JMP -> just jump wathever the output
|
|
- Examples:
|
|
@3 // Set A to 3
|
|
0;JMP // unconditional jump to code line 3.
|
|
@42 // Set A to 42
|
|
D=D-A;JEQ: // Set D to D-A. if D-A == 0, jump to code line nb 42.
|
|
@i // Point onto var i, the real RAM address is handled by the assembler
|
|
M=A // Set corresponding value to it's own address
|
|
A=A+1 // Point to the RAM address just after i
|
|
|
|
## Labels
|
|
`"(" LABEL_NAME ")"`
|
|
When performing a jump, the appropriate line of code should be put in the
|
|
A-register. Setting directly the line number with a `@integer` instruction
|
|
is delicate since one has to figure out the line number ignoring comments,
|
|
blank lines, etc... And all the addresses have to be updated if the beginning of
|
|
the assembly code is edited afterward.
|
|
So the assembly language proposes to mark lines with a label using the `(LABEL)`
|
|
syntax. The assembler will then automatically adjust any `@LABEL` instruction
|
|
to match the desired code line at assembly time.
|
|
Example:
|
|
// This code runs a loop 42 times and then stops in an infinite empty loop
|
|
00 @MAIN // @2
|
|
01 0;JMP
|
|
(MAIN)
|
|
02 @42 // Set D to 42
|
|
03 D=A
|
|
04 @DECREMENT // @6
|
|
05 0;JMP
|
|
(DECREMENT)
|
|
06 D=D-1 // Decrement D
|
|
07 @END // @11
|
|
08 D;JEQ // Go there if D==0
|
|
09 @DECREMENT // Or continue the loop
|
|
10 0;JMP
|
|
(END)
|
|
11 @END // Infinity loop to end the programm
|
|
12 0;JMP
|
|
"""
|
|
|
|
import sys
|
|
import re
|
|
|
|
def create_symbol_table():
|
|
# Erzeugen eines dict
|
|
return {
|
|
key: (value)
|
|
for key, value in {
|
|
**{'@SP': 0,
|
|
'@LCL': 1,
|
|
'@ARG': 2,
|
|
'@THIS': 3,
|
|
'@THAT': 4,
|
|
'@SCREEN': 0x4000,
|
|
'@KBD': 0x6000,},
|
|
**{f'@R{i}': i
|
|
for i in range(16)}
|
|
}.items()}
|
|
|
|
|
|
|
|
|
|
|
|
def asm(file):
|
|
symbol_table = create_symbol_table()
|
|
print(symbol_table)
|
|
|
|
regex = r"^\n"
|
|
asmfile = open(file, 'r')
|
|
asmlines = asmfile.readlines()
|
|
asm =[]
|
|
#print(asmlines)
|
|
for l in asmlines:
|
|
if l.startswith('//'):
|
|
pass
|
|
elif re.match(regex, l):
|
|
pass
|
|
else:
|
|
asm.append(l.replace('\n', ''))
|
|
print(l)
|
|
print(asm)
|
|
|
|
|
|
|
|
print (sys.argv)
|
|
asm_file = asm(sys.argv[1])
|
|
|