This commit is contained in:
Sven Riwoldt
2024-04-01 20:30:24 +02:00
parent fd333f3514
commit c7bc862c6f
6804 changed files with 1065135 additions and 0 deletions

192
asm/asm01.py Normal file
View File

@@ -0,0 +1,192 @@
"""Assembler for the hack computer
Usage:
python assembler.py file
Loads an assembly file and translate it into machine language for the hack
computer as specified in project 6 of the nand2tetris course.
# Assembler implementation details
This assembler works in 3 steps:
1. Load and clean the assembly file
2. Construct a symbol table referencing the user-defined labels and variables
3. Translate the asm file to binary code
The file follows this pattern:
File parsing (line 147)
Symbol table (line 207)
Assembling (line 322)
??
# Assembly language specifications:
## Note on registers
The Hack computer has three 16-bit registers: the D-, A- and M-registers.
The D-register is used to store "data" that can be used as input for the ALU.
The A-register can be used in the same way but it also has a second role:
The RAM use it as address input. So any read/write instruction to the RAM is
done on the register which has the value of the A-register as address.
The M-register represents this register in the RAM that is 'pointed' onto by the
A-register. Reading/writing to it consists actually in reading/writing to
the RAM.
## Assembly instructions
There are three types of assembly instructions: A-instructions, C-instructions
and labels. Indents and blanks are ignored. Comments can only be in-line, start
with "//" and are ignored.
## A-instructions
- `"@" integer` where integer is a number in the range 0->32768. Sets the A
register to contain the specified integer. Ex: @42
- `"@" label` where label is a user-defined label. Sets the A register to
contain the code address corresponding to the label.
Labels are upper-cased by convention, with "_" as word separator. Ex: @MAIN
- `"@" variable` where variable is a user-defined variable. Sets the A register
to contain the RAM adress corresponding to the variable. If a variable is
encountered for the first time, it is automatically assigned an address.
The address assignment starts at RAM address 16 and increments.
Variables are lowercased by convention, with "_" as word separator. Ex: @i
## C-instructions
`(Dest-code "=")? op-code (";" jump-code)?`
- op-code:
Only the op-code is mandatory. It represents an instruction to be performed
by the ALU. Available codes and their associated outputs are:
- 0 -> the constant 0
- 1 -> the constant 1
- -1 -> the constant -1
- D -> the value contained in the D-register
- A -> the value contained in the A-register
- M -> the value contained in the M-Register
- !D -> bit-wise negation of the D-register
- !A -> bit-wise negation of the A-register
- !M -> bit-wise negation of the M-register
- -D -> numerical negation of the D-register using 2's complement
- -A -> numerical negation of the A-register using 2's complement
- -M -> numerical negation of the M-register using 2's complement
- D+1 -> 1 + value of the D-register
- A+1 -> 1 + value of the A-register
- M+1 -> 1 + value of the M-register
- D-1 -> -1 + value of the D-register
- A-1 -> -1 + value of the A-register
- M-1 -> -1 + value of the M-register
- D+A -> value of the D-register + value of the A-register
- D+M -> value of the D-register + value of the M-register
- D-A -> value of the D-register - value of the A-register
- D-M -> value of the D-register - value of the M-register
- A-D -> value of the A-register - value of the D-register
- M-D -> value of the M-register - value of the D-register
- D&A -> bit-wise AND of the values of the D and A registers
- D&M -> bit-wise AND of the values of the D and M registers
- D|A -> bit-wise OR of the values of the D and A registers
- D|M -> bit-wise OR of the values of the D and M registers
- dest-code:
If specified, should be followed with a "=" character. Available codes are:
- D -> write the ALU instruction's output to the D-register
- A -> write the ALU instruction's output to the A-register
- M -> write the ALU instruction's output to the M-register
- AD -> write the ALU instruction's output to the A- and D-registers
- AM -> write the ALU instruction's output to the A- and M-registers
- MD -> write the DLU instruction's output to the D- and M-registers
- ADM -> write the DLU instruction's output to the A-, D- and M-registers
- jump-code:
If specified, should be preceded by a ";" character. The computer is fed
with a programm containing one binary instruction per line. Each of those
instructions should be seen as having a number, starting at 0 and increasing
by one. The jump-code lets the computer jump to the instruction of which the
address is contained in the A-register if the result of the current
operation satisfies a certain condition. Available codes and corresponding
conditions are:
- JEQ -> jump if the output is equal to 0
- JLT -> jump if the output is lower than 0
- JLE -> jump if the output is lower than 0 or equal to 0
- JGT -> jump if the output is greater than 0
- JGE -> jump if the output is greater than 0 or equal to 0
- JNE -> jump if the output is not 0
- JMP -> just jump wathever the output
- Examples:
@3 // Set A to 3
0;JMP // unconditional jump to code line 3.
@42 // Set A to 42
D=D-A;JEQ: // Set D to D-A. if D-A == 0, jump to code line nb 42.
@i // Point onto var i, the real RAM address is handled by the assembler
M=A // Set corresponding value to it's own address
A=A+1 // Point to the RAM address just after i
## Labels
`"(" LABEL_NAME ")"`
When performing a jump, the appropriate line of code should be put in the
A-register. Setting directly the line number with a `@integer` instruction
is delicate since one has to figure out the line number ignoring comments,
blank lines, etc... And all the addresses have to be updated if the beginning of
the assembly code is edited afterward.
So the assembly language proposes to mark lines with a label using the `(LABEL)`
syntax. The assembler will then automatically adjust any `@LABEL` instruction
to match the desired code line at assembly time.
Example:
// This code runs a loop 42 times and then stops in an infinite empty loop
00 @MAIN // @2
01 0;JMP
(MAIN)
02 @42 // Set D to 42
03 D=A
04 @DECREMENT // @6
05 0;JMP
(DECREMENT)
06 D=D-1 // Decrement D
07 @END // @11
08 D;JEQ // Go there if D==0
09 @DECREMENT // Or continue the loop
10 0;JMP
(END)
11 @END // Infinity loop to end the programm
12 0;JMP
"""
import sys
import re
def create_symbol_table():
# Erzeugen eines dict
return {
key: (value)
for key, value in {
**{'@SP': 0,
'@LCL': 1,
'@ARG': 2,
'@THIS': 3,
'@THAT': 4,
'@SCREEN': 0x4000,
'@KBD': 0x6000,},
**{f'@R{i}': i
for i in range(16)}
}.items()}
def asm(file):
symbol_table = create_symbol_table()
print(symbol_table)
regex = r"^\n"
asmfile = open(file, 'r')
asmlines = asmfile.readlines()
asm =[]
#print(asmlines)
for l in asmlines:
if l.startswith('//'):
pass
elif re.match(regex, l):
pass
else:
asm.append(l.replace('\n', ''))
print(l)
print(asm)
print (sys.argv)
asm_file = asm(sys.argv[1])