"""Assembler for the hack computer Usage: python assembler.py file Loads an assembly file and translate it into machine language for the hack computer as specified in project 6 of the nand2tetris course. # Assembler implementation details This assembler works in 3 steps: 1. Load and clean the assembly file 2. Construct a symbol table referencing the user-defined labels and variables 3. Translate the asm file to binary code The file follows this pattern: File parsing (line 147) Symbol table (line 207) Assembling (line 322) ?? # Assembly language specifications: ## Note on registers The Hack computer has three 16-bit registers: the D-, A- and M-registers. The D-register is used to store "data" that can be used as input for the ALU. The A-register can be used in the same way but it also has a second role: The RAM use it as address input. So any read/write instruction to the RAM is done on the register which has the value of the A-register as address. The M-register represents this register in the RAM that is 'pointed' onto by the A-register. Reading/writing to it consists actually in reading/writing to the RAM. ## Assembly instructions There are three types of assembly instructions: A-instructions, C-instructions and labels. Indents and blanks are ignored. Comments can only be in-line, start with "//" and are ignored. ## A-instructions - `"@" integer` where integer is a number in the range 0->32768. Sets the A register to contain the specified integer. Ex: @42 - `"@" label` where label is a user-defined label. Sets the A register to contain the code address corresponding to the label. Labels are upper-cased by convention, with "_" as word separator. Ex: @MAIN - `"@" variable` where variable is a user-defined variable. Sets the A register to contain the RAM adress corresponding to the variable. If a variable is encountered for the first time, it is automatically assigned an address. The address assignment starts at RAM address 16 and increments. Variables are lowercased by convention, with "_" as word separator. Ex: @i ## C-instructions `(Dest-code "=")? op-code (";" jump-code)?` - op-code: Only the op-code is mandatory. It represents an instruction to be performed by the ALU. Available codes and their associated outputs are: - 0 -> the constant 0 - 1 -> the constant 1 - -1 -> the constant -1 - D -> the value contained in the D-register - A -> the value contained in the A-register - M -> the value contained in the M-Register - !D -> bit-wise negation of the D-register - !A -> bit-wise negation of the A-register - !M -> bit-wise negation of the M-register - -D -> numerical negation of the D-register using 2's complement - -A -> numerical negation of the A-register using 2's complement - -M -> numerical negation of the M-register using 2's complement - D+1 -> 1 + value of the D-register - A+1 -> 1 + value of the A-register - M+1 -> 1 + value of the M-register - D-1 -> -1 + value of the D-register - A-1 -> -1 + value of the A-register - M-1 -> -1 + value of the M-register - D+A -> value of the D-register + value of the A-register - D+M -> value of the D-register + value of the M-register - D-A -> value of the D-register - value of the A-register - D-M -> value of the D-register - value of the M-register - A-D -> value of the A-register - value of the D-register - M-D -> value of the M-register - value of the D-register - D&A -> bit-wise AND of the values of the D and A registers - D&M -> bit-wise AND of the values of the D and M registers - D|A -> bit-wise OR of the values of the D and A registers - D|M -> bit-wise OR of the values of the D and M registers - dest-code: If specified, should be followed with a "=" character. Available codes are: - D -> write the ALU instruction's output to the D-register - A -> write the ALU instruction's output to the A-register - M -> write the ALU instruction's output to the M-register - AD -> write the ALU instruction's output to the A- and D-registers - AM -> write the ALU instruction's output to the A- and M-registers - MD -> write the DLU instruction's output to the D- and M-registers - ADM -> write the DLU instruction's output to the A-, D- and M-registers - jump-code: If specified, should be preceded by a ";" character. The computer is fed with a programm containing one binary instruction per line. Each of those instructions should be seen as having a number, starting at 0 and increasing by one. The jump-code lets the computer jump to the instruction of which the address is contained in the A-register if the result of the current operation satisfies a certain condition. Available codes and corresponding conditions are: - JEQ -> jump if the output is equal to 0 - JLT -> jump if the output is lower than 0 - JLE -> jump if the output is lower than 0 or equal to 0 - JGT -> jump if the output is greater than 0 - JGE -> jump if the output is greater than 0 or equal to 0 - JNE -> jump if the output is not 0 - JMP -> just jump wathever the output - Examples: @3 // Set A to 3 0;JMP // unconditional jump to code line 3. @42 // Set A to 42 D=D-A;JEQ: // Set D to D-A. if D-A == 0, jump to code line nb 42. @i // Point onto var i, the real RAM address is handled by the assembler M=A // Set corresponding value to it's own address A=A+1 // Point to the RAM address just after i ## Labels `"(" LABEL_NAME ")"` When performing a jump, the appropriate line of code should be put in the A-register. Setting directly the line number with a `@integer` instruction is delicate since one has to figure out the line number ignoring comments, blank lines, etc... And all the addresses have to be updated if the beginning of the assembly code is edited afterward. So the assembly language proposes to mark lines with a label using the `(LABEL)` syntax. The assembler will then automatically adjust any `@LABEL` instruction to match the desired code line at assembly time. Example: // This code runs a loop 42 times and then stops in an infinite empty loop 00 @MAIN // @2 01 0;JMP (MAIN) 02 @42 // Set D to 42 03 D=A 04 @DECREMENT // @6 05 0;JMP (DECREMENT) 06 D=D-1 // Decrement D 07 @END // @11 08 D;JEQ // Go there if D==0 09 @DECREMENT // Or continue the loop 10 0;JMP (END) 11 @END // Infinity loop to end the programm 12 0;JMP """ import sys import re def create_symbol_table(): # Erzeugen eines dict return { key: (value) for key, value in { **{'@SP': 0, '@LCL': 1, '@ARG': 2, '@THIS': 3, '@THAT': 4, '@SCREEN': 0x4000, '@KBD': 0x6000,}, **{f'@R{i}': i for i in range(16)} }.items()} def asm(file): symbol_table = create_symbol_table() print(symbol_table) regex = r"^\n" asmfile = open(file, 'r') asmlines = asmfile.readlines() asm =[] #print(asmlines) for l in asmlines: if l.startswith('//'): pass elif re.match(regex, l): pass else: asm.append(l.replace('\n', '')) print(l) print(asm) print (sys.argv) asm_file = asm(sys.argv[1])