The BMDS assembler currently supports the 6502 and 8080 processors, but should be able to do CHIP-8 in due course. It will assemble code for the CPU of the current machine selected.
Basic Operation
Opening a new file in the Assembler gives a basic template in the editor, appropriate to the current machine CPU, which then needs to be saved and assigned a filename. The extension .asm is used.
When the Assemble option is clicked, the current selected source file in the editor is opened in the assembler unit. The main source file may open as many include source files as required, one at a time.
Each source file is interpreted from top to bottom on a line by line basis. Include files follow exactly the same rules as the main source file.
When an include file statement is encountered the current file position is saved and the include file opened and interpreted. When the end-of-file is reached the old file position is restored and assembly continues at the next line. If nested include files are used, the current file position is still saved and recalled on a last-in-first-out basis until all include files and the main source file is complete.
All source are treated as case-insensitive so it may contain uppercase, lowercase, or mixed case in the code.
Syntax
Source Lines
A source file consists of a number of program source lines, with each comprising up to four fields. These take the form of:
1
2
3
[Label_field[:]] Instruction_field [Operand_field] [Comment_field]
[Comment_field]
Items in brackets are optional. Labels and comments are always optional, and many instructions / opcodes do not require operands.
Blank lines are ignored. All fields must be separated by at least one space or TAB character.
Labels
If a label is present, it must begin in column 1 and be unique. Labels can be optionally followed by a colon when defined, but not when referenced in an expression.
Labels cannot be the same as any machine defined instruction, register name, operator or macro.
In this two-pass assembler, labels can be referenced before the are defined (forward reference), but all label values should be resolved at the end of the first pass.
Labels may be up to 32 characters in length, and may include any of the following characters:
1
A..Z a..z 0..9 _(underscore)
The first character of a symbol must be a letter or an underscore.
Comments
If the first character on the source line is for a comment, the whole line is ignored by the assembler, but the line is included in the source file. Comments can also be placed after the instruction field, or optional operand field if there is one. Comments are preceded by a semicolon or slash-slash combination as shown:
1
; //
Numeric Constants
The assembler supports several number formats:
- Decimal numbers, e.g. 1234 -10 +178
- Hexadecimal numbers, e.g. $FE37
- Binary numbers, e.g. %1001
Location Counter Reference
The character $ or * may be used as an element in expressions to represent the value of the location counter. The $ or * characters must be followed by a whitespace character.
Note: if the assembler is expecting a value it will interpret * as the value of the location counter, otherwise it will treat it as the multiplication operator. If the $ symbol has one or more hexadecimal characters after it then it will be interpreted as a number, otherwise it will be interpreted as the value of the location counter.
Expressions
Expressions may be composed from a number of operators, parenthesis and symbols.
- Unary operators precede terms, include + (plus), – (minus), > (high byte), < (low byte)
- Arithmetic operators include + (plus), – (minus), * (multiplye), / (divide)
- Left/right parenthesis ( and ) can be used to ensure the correct precedence of evaluation
Pseudo Operations
Pseudo-operations are instructions that appear in the opcode field but are instructions to the assembler rather than an instruction for the target processor. They may be followed by one or more operands.
All pseudo-ops can be preceded with a “.” (example: “.ORG” works the same as “ORG”).
To cater for the different directives used by different assembler programs / microprocessor manufacturers, and to maximise compatibility, several pseudo-ops have the same meaning and result in the same action.
- Storage Definition
- DB / BYTE / FCB / DEFB Define Byte / Byte / Form Constant Byte) store one or more byte values sequentially in the code. Multiple expressions must be separated by commas
- DW / WORD / FDB / DEFW (Define Word / Word / Form Double Byte) store one or more word (two byte) values sequentially in the code. Multiple expressions must be separated by commas
- FCC / TEXT (Form Constant Character / Text) store ASCII strings into consecutive bytes of memory. Any label is assigned to the first byte in the string. Any of the printable ASCII characters can be contained in the string and the string is specified between two identical delimiters
- DS / RMB / DEFS (Define Space / Reserve Memory Byte) skips a number of bytes, optionally initialised. The expression is evaluated and determines the number of bytes to be reserved. The program location counter is incremented by this value. If an optional second expression is included it will be evaluated and the lowest 8 bits used to initialise the number of bytes determined by the first parameter
- ORG (Origin) – sets the starting address of a program, a part of the program, or a data block. All following bytes are stored in consecutive addresses, starting at the address specified by the expression following the instruction. This statement should precede the first code generating statement in the source file. If no ORG is present in the source file the starting address will default to 0000
- END – marks the end of the main source file code. After the END statement, the assembler stops processing input. If in Pass 1 the assembler will immediately go into Pass 2, ignoring anything that follows
- Symbol Definition EQU / =, assign a constant value to a label, no code is generated. In normal use a new label will get the value of the current memory location. A label that is assigned using the EQU directive can be considered to be a constant value because its value can not be changed anymore in the assembly run. The = symbol is equivalent to the EQU directive
Assembler Directives
Assembler directives are instructions that control the flow of the assembler, and may be followed by one or more operands.
The # of an assembler directive must appear in the first column of the source line.
- Conditional Assembly
- #IF #ELSE #ENDIF are the three directives used for conditional assembly, and operate in the same way as any programming language. If the expression evaluates to true (value is not zero) then all the following source lines will be assembled normally until an #ELSE or #ENDIF statement is encountered, otherwise the following source lines will be ignored until an #ELSE or #ENDIF statement is encountered. The expression must be capable of evaluation in pass 1, i.e. no forward references. The #ENDIF statement terminates a conditional block irrespective of whether the #IF expression evaluated to true or false, and normal assembly resumes.
- External Source Files
- #INCLUDE directive is useful in splitting a large program into meaningful smaller files dedicated to a specific task, that are easier to maintain. These files are loaded as they are encountered to form the total source of the assembly program. The filename to be loaded is the operand of the #INCLUDE directive, and is read once on each pass. Include files can be nested
- Macros
- #DEFM and #ENDM are used to define macros. The macro name follows the #DEFM, with the macro code placed after it and up until the #ENDM returns to normal assembly. Subsequently using the macro name in the instruction field results in the defined macro instructions being inserted in lieu of the macro name [TODO: parameters still to be coded]
Notes / Resources
- Tanbug Source Code Listing
- MCS6500 Family Programming Manual (1976) [PDF]
- 6502 Assembly Language Programming – Leventhal (1979) [PDF]