617 lines
18 KiB
Markdown
617 lines
18 KiB
Markdown
# LMIA advanced development kit
|
|
|
|
This project was developed as a simple development kit for programming and
|
|
microprogramming the LMIA system.
|
|
|
|
|
|
## μASM instruction set
|
|
|
|
### NOP
|
|
No-operation. This wastes one clock cycle.
|
|
|
|
|
|
### MOV [regA] \[regB\]
|
|
Move value in *regA* to *regB*
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If regB is [**LC**](#lc), no other [**LC**](#lc) operation can be specified in the
|
|
same cycle*
|
|
|
|
|
|
### MVN {[reg] | [const]}
|
|
Move the inverse of a value (from register or constant) into register
|
|
[**AR**](#ar).
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
|
|
### MVZ
|
|
Set [**AR**](#ar) to zero.
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n)*
|
|
|
|
|
|
### ADD {[reg] | [const]}
|
|
Add value (from register or constant) to register [**AR**](#ar).
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n), [**O**](#o), [**C**](#c)*
|
|
|
|
|
|
### SUB {[reg] | [const]}
|
|
Subtract value (from register or constant) from register [**AR**](#ar).
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n), [**O**](#o), [**C**](#c)*
|
|
|
|
|
|
### AND {[reg] | [const]}
|
|
Perform bitwise AND with given value (from register or constant) and register
|
|
**AR**. *reg* is ignored and only prevalent due to a microcompiler quirk;
|
|
expect it to be removed in future releases.
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n)*
|
|
|
|
|
|
### ORR {[reg] | [const]}
|
|
Perform bitwise Or with given value (from register or constant) and register
|
|
**AR**.
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n)*
|
|
|
|
|
|
|
|
### ADN {[reg] | [const]}
|
|
Add value (from register or constant) to register [**AR**](#ar) without
|
|
updating flags.
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
|
|
### LSL
|
|
Performs a logical shift left of [**AR**](#ar).
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n), [**C**](#c)*
|
|
|
|
|
|
### ISL
|
|
Performs a logical shift left of [**AR**](#ar) and [**HR**](#hr) as if they
|
|
were one 32-bit register where [**AR**](#ar) corresponds to the
|
|
most-significant bits.
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n), [**C**](#c)*
|
|
|
|
|
|
### ASR
|
|
Performs an arithmetic shift right on [**AR**](#ar).
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n), [**C**](#c)*
|
|
|
|
|
|
### ISR
|
|
Performs an arithmetic shift right of [**AR**](#ar) and [**HR**](#hr) as if
|
|
they were one 32-bit register where [**AR**](#ar) corresponds to the
|
|
most-significant bits.
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n), [**C**](#c)*
|
|
|
|
|
|
### LSR
|
|
Performs a logical shift right of [**AR**](#ar).
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n), [**C**](#c)*
|
|
|
|
|
|
### ROL
|
|
Performs an arithmetic rotate right on [**AR**](#ar).
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n), [**C**](#c)*
|
|
|
|
|
|
### IRL
|
|
Performs an arithmetic shift left of [**AR**](#ar) and [**HR**](#hr) as if they
|
|
were one 32-bit register where [**AR**](#ar) corresponds to the
|
|
most-significant bits.
|
|
|
|
*This operation uses the bus*
|
|
|
|
*If a constant is passed, this operation cannot be parallellized*
|
|
|
|
*Sets flags: [**Z**](#z), [**N**](#n), [**C**](#c)*
|
|
|
|
|
|
### LCSET [const]
|
|
Set [**LC**](#lc) to value of constant.
|
|
|
|
|
|
### CONST [const]
|
|
Set [**AR**](#ar) to value of cosntant.
|
|
|
|
*This operation cannot be parallelized*
|
|
|
|
|
|
### INCPC
|
|
Increment value in [**PC**](#pc) by one.
|
|
|
|
|
|
### DECLC
|
|
Decrement value in [**LC**](#lc) by one.
|
|
|
|
|
|
### CALL [label]
|
|
Move value in [**uPC**](#upc) to [**uSP**](#usp) and set value in [**uPC**](#upc) to point
|
|
to the address of the given label.
|
|
|
|
|
|
### RET
|
|
Move value in [**uSP**](#usp) into [**uPC**](#upc).
|
|
|
|
|
|
### HALT
|
|
Stop execution and set value in [**uPC**](#upc) to **0**.
|
|
|
|
|
|
### BRA [label]
|
|
Perform an unconditional branch to the address of the given label.
|
|
|
|
|
|
### BNZ [label]
|
|
Branch to address of label if [**Z-flag**](#z) is **0**.
|
|
|
|
|
|
### BRZ [label]
|
|
Branch to address of label if [**Z-flag**](#z) is **1**.
|
|
|
|
|
|
### BRN [label]
|
|
Branch to address of label if [**N-flag**](#n) is **1**.
|
|
|
|
|
|
### BRC [label]
|
|
Branch to address of label if [**C-flag**](#c) is **1**.
|
|
|
|
|
|
### BRO [label]
|
|
Branch to address of label if [**O-flag**](#o) is **1**.
|
|
|
|
|
|
### BLS [label]
|
|
Branch to address of label if [**L-flag**](#l) is **1**.
|
|
|
|
|
|
### BNC [label]
|
|
Branch to address of label if [**C-flag**](#c) is **0**.
|
|
|
|
|
|
### BNO [label]
|
|
Branch to address of label if [**O-flag**](#o) is **0**.
|
|
|
|
|
|
### BOP
|
|
Branch to address specified by entry in optable pointed to by highest [**OP**](#op)
|
|
in [**IR**](#ir).
|
|
|
|
|
|
### BAM
|
|
Branch to address specified by entry in addressing mode pointed to by [**M**](#m)
|
|
in [**IR**](#ir).
|
|
|
|
|
|
### BST
|
|
Branch to start. This sets value in [**uPC**](#upc) to **0**.
|
|
|
|
|
|
### RESET [reg]
|
|
Sets all bits in the specified register to **1**.
|
|
|
|
*This operation uses the bus*
|
|
|
|
|
|
## Compiler/weaver directives
|
|
|
|
### \#define \[name] [const]
|
|
Define a compile-time constant. This will replace all (valid) constant
|
|
declarations with the given name with the value supplied here.
|
|
|
|
**NOTE**: Constant names are case-insensitive; i.e. *FOO* and *foo* are
|
|
functionally indistinguishable to the compiler.
|
|
|
|
|
|
### \#data \[address] [const]
|
|
Define an initial value in the program memory at the given address.
|
|
|
|
|
|
### $[label]
|
|
Define a compile-time label at the given position in the microprogram. Labels
|
|
can be referenced using an '@' symbol.
|
|
|
|
For example: `$BAR` would declare a label *BAR* which can be referenced with
|
|
`@BAR`.
|
|
|
|
**NOTE**: Label names are case-insensitive; i.e. `@FOO` and `@foo` are
|
|
functionally indistinguishable to the compiler.
|
|
|
|
|
|
### \#optable \[index] {[label] | [const]}
|
|
Declare an entry in the opcode jump table (**K1**).
|
|
|
|
**NOTE**: The given index must be at most 15 and at least 0 and in the case of
|
|
a constant being supplied as the value, this value my not be negative nor be
|
|
greater than 127.
|
|
|
|
|
|
### \#amode \[index] {[label] | [const]}
|
|
Declare an entry in the addressing mode jump table (**K2**).
|
|
|
|
**NOTE**: The given index must be at most 3 and at least 0 and in the case of
|
|
a constant being supplied as the value, this value my not be negative nor be
|
|
greater than 127.
|
|
|
|
|
|
### \#pmgen [python]
|
|
Define a script to be run for each address in program memory. This can be used
|
|
to generate program memory value mappings more efficiently than manually
|
|
defining values. The script is run in a for-loop with the index variable
|
|
`address`. I.e. the variable `address` can be used in the specified script to
|
|
refer to the current program memory address. To emit a value, simply print a
|
|
string in the format "\[address] [value]" where the value can be in the range
|
|
-32768 to 65535 (inclusive).
|
|
|
|
|
|
### \#emit
|
|
Define a script to emit uASM code at compile-time. Each following line preceded
|
|
by a `>` will be included in the script; the first line not preceded by `>`
|
|
(or EOF) will define the end of the script. To emit code, simply print the code
|
|
that should be emitted.
|
|
|
|
Example:
|
|
|
|
```
|
|
mov ar asr
|
|
#emit
|
|
>for i in range(4):
|
|
> print("lsl; mov ar ir")
|
|
> print("add pm")
|
|
sub gr
|
|
```
|
|
|
|
the above code will be converted into the the following code by the preprocessor:
|
|
|
|
```
|
|
mov ar asr
|
|
lsl; mov ar ir
|
|
add pm
|
|
lsl; mov ar ir
|
|
add pm
|
|
lsl; mov ar ir
|
|
add pm
|
|
lsl; mov ar ir
|
|
add pm
|
|
sub gr
|
|
```
|
|
|
|
|
|
## Flags
|
|
Flags - *aside from [**L**](#l)* - are set based on ALU operations, so they depend
|
|
on [**AR**](#ar) and the **BUS**. Henceforth, unless otherwise implied or
|
|
stated, [**AR**](#ar) will refer to the state/value of [**AR**](#ar) *after* an
|
|
ALU operation. Flags retain their state until an instruction which is declared
|
|
as modifying said flag is executed.
|
|
|
|
### Z
|
|
Set if [**AR**](#ar) == **0**.
|
|
|
|
### N
|
|
Set if sign bit in [**AR**](#ar) is set.
|
|
|
|
### O
|
|
Set if sign of [**AR**](#ar) differs from signs of both [**AR**](#ar) *before*
|
|
the arithmetic operation and of the **BUS**.
|
|
|
|
### C
|
|
Set if [**AR**](#ar) is less than or equal to [**AR**](#ar) *before* the
|
|
arithmetic operation.
|
|
|
|
### L
|
|
Set if [**LC**](#lc) == **0**.
|
|
|
|
|
|
## Registers
|
|
Available registers for read/write operations are documented below. Unless
|
|
otherwise specified, the registers are directly accessible via the bus for read
|
|
and write operations.
|
|
|
|
### AR
|
|
The accumulator register. This register can only be written to as the result of
|
|
an ALU operation. This is to say, that **AR** is indirectly writable via the ALU,
|
|
but is nonetheless directly readable via the bus.
|
|
|
|
### PM
|
|
The program-memory pseudo-register allows you to read/write values from the
|
|
currently accessed program memory address (see [**ASR**](#asr-1)).
|
|
|
|
### ASR
|
|
The address register; this register is used to specify which address of
|
|
program-memory to be accessible via [**PM**](#pm).
|
|
|
|
**NOTE**: This register cannot be read.
|
|
|
|
### HR
|
|
The help register. This is a general-purpose register which is useful for
|
|
storing ephemeral or intermediate values during a computation.
|
|
|
|
### IR
|
|
The instruction register. This register offers extra functionality such as
|
|
**K1**- and **K2**-table addressing via the [**OP**](#op) and [**M**](#m)
|
|
bits respectively. The [**GRx**](#grx) and [**M**](#m) bits can also be used
|
|
to address a specific general register via the [**GR**](#gr) multiplexer.
|
|
|
|
Bit-level layout of IR (from MSB to LSB, left-to-right):
|
|
|
|
| **OP** | **GRx** | **M** | **ADR** |
|
|
|:------:|:-------:|:------:|:-------:|
|
|
| 4 bits | 2 bits | 2 bits | 8 bits |
|
|
|
|
|
|
##### OP
|
|
Machine instruction. Indexes **K1**-table.
|
|
|
|
##### GRx
|
|
[**GR**](#gr) multiplexer selector.
|
|
|
|
##### M
|
|
Addressing mode. Indexes **K2**-table.
|
|
|
|
##### ADR
|
|
[**ASR**](#asr-1) address argument.
|
|
|
|
|
|
|
|
### GR
|
|
This is a shorthand for accessing the general register currently made available
|
|
by the GR multiplexer when said MUX is controlled by the [**GRx**](#grx) bits in
|
|
[**IR**](#ir).
|
|
|
|
**NOTE**: Only one GR can be accessed per cycle. Which register this is (of
|
|
the four available registers) is determined by the value in [**IR**](#ir).
|
|
|
|
### GRM
|
|
This is a shorthand for accessing the general register currently made available
|
|
by the GR multiplexer when said MUX is controlled by the [**M**](#m) bits in
|
|
[**IR**](#ir).
|
|
|
|
**NOTE**: Only one GR can be accessed per cycle. Which register this is (of
|
|
the four available registers) is determined by the value in [**IR**](#ir).
|
|
|
|
### LC
|
|
The loopcounter register is a special register that can only be modified in
|
|
three ways: all three ways are dictated by the L-field in the microprogram.
|
|
Additionally, the counter can remain unchanged by simply not specifying an
|
|
action to take for the register. The three ways of modifying it are as follows:
|
|
|
|
* Decrement counter by one. The corresponding uASM instruction for this is
|
|
`declc`
|
|
* Load value from bus. The uASM instruction being `mov [reg] LC`
|
|
* Load a 7-bit constant from micromemory. The uASM instruction for this is
|
|
`lcset [const]`
|
|
|
|
**NOTE**: LC cannot be read and can only be written to as detailed above. The
|
|
value in LC can, though, to some degree be inferred from the [**L-flag**](#l)
|
|
(see [*Flags*](#flags)).
|
|
|
|
### PC
|
|
The program counter. This register is only 8 bits wide. Writing a value larger
|
|
than 8 bits to this register will simply truncate the binary string to 8 bits.
|
|
Attemping to read a value from PC to a register of a larger width will result
|
|
in the value in PC populating the lowesr 8 bits and the rest being filled with
|
|
0's.
|
|
|
|
### uPC
|
|
The microprogram counter; sometimes also referred to as **MyPC**. This register
|
|
cannot directly be read. It can, though, be written to in a variety of ways.
|
|
These have been specified as *branch* instructions (such as [`BRA`](#bra-label)), as
|
|
well as subroutine instructions (such as [`ret`](#ret)).
|
|
|
|
**NOTE**: This register is not connected to the bus.
|
|
|
|
### uSP
|
|
The microstack pointer; sometimes also referred to as **MySPC**. This register
|
|
cannot be directly read or written to. To write to it, a call to a subroutine
|
|
must be issued (using [`call [label]`](#call-label)). The only way to "read" this
|
|
register is to issue a subroutine-return instruction ([`ret`](#ret)) which
|
|
copies the value in this register to [**uPC**](#upc).
|
|
|
|
**NOTE**: This register is not connected to the bus. The value in this register
|
|
cannot be directly acted upon insofar as it cannot be directly read or copied
|
|
to any other register than [**uPC**](#upc), meaning that the actual value in
|
|
this register (and consequently in [**uPC**](#upc) aswell) cannot be directly
|
|
accessed or transformed. I.e. the value in this register cannot be copied to,
|
|
for example, a general register and thus, its value can only be inferred, not
|
|
directly measured.
|
|
|
|
|
|
## Sorting algorithms
|
|
For the sorting competition, we have chosen to focus on implementing bucketsort
|
|
with an inline insertionsort when inserting values into corresponding buckets.
|
|
|
|
### bucksort.uc
|
|
|
|
This was the first attempt at a sorting implementation in uASM. It doesn't do
|
|
more than a simple hash and possibly updating some bucket-specific values.
|
|
Nonetheless, it formed a clear basis for future implementations.
|
|
|
|
Pros:
|
|
* N/A
|
|
|
|
Cons:
|
|
* N/A
|
|
|
|
**Average cycle count: N/A**
|
|
|
|
|
|
### bsrt.uc
|
|
|
|
The first successful uASM bucketsort implementation. It uses the aforementioned
|
|
inline insertionsort. Additionally, it employs a lookup table for pointing to
|
|
buckets; this had the benefit of significantly decreasing the cycles required
|
|
to hash values, as well as allowing for buckets of sizes other than even
|
|
exponents of 2 (as opposed to other implementations). On top of this, it used a
|
|
parallel-hashing system which allowed the rotation step of the hash algorithm
|
|
to be applied to two elements of the list (to be sorted) at once, further
|
|
decreasing the amount of cycles required to perform a hash (per element).
|
|
|
|
Pros:
|
|
* Variable bucket size
|
|
* Arbitrary bucket arrangement
|
|
* Parallel-hash implementation
|
|
* Uses the most recent merge algorithm (rated around 400 cycles)
|
|
|
|
Cons:
|
|
* Heavy bookkeeping due to lookup table paired with parallel-hash
|
|
* Inefficiency in merge due to lookup table (average loss of 80 cycles)
|
|
|
|
**Average cycle count: 1250**
|
|
|
|
|
|
### sort2.uc
|
|
|
|
A second iteration of the *bsrt* implementation, this one refines the dual-hash
|
|
by omitting the lookup table. This means that bookkeeping can be minimized, as
|
|
it was only necessary due to a lack of available registers. This, of course,
|
|
comes at the cost of not being able to have variable-sized buckets; i.e. their
|
|
sizes must an exponent of 2. A minor optimization that was created for this
|
|
algorithm was the hash-based bookkeeping, wherein a small optimization to
|
|
bookkeeping could be done during the hashing of the list elements. Any
|
|
bookkeeping which does not require the ALU can, in fact, be performed during the
|
|
hashing, since the rotation steps only require AR and HR to be untouched. All
|
|
other operations are permitted.
|
|
|
|
Pros:
|
|
* Merge: (320-330 cycles)
|
|
* Direct addressing fits optimally due to register limitations
|
|
* Optimized regster allocations
|
|
* Minimal bookkeeping since it is mostly performed during the hashing operation
|
|
|
|
Cons:
|
|
* Bucket sizes must be exponents of 2
|
|
* Fixed bucket indices based on hash
|
|
* Two merge operations required (negative + positive)
|
|
* 95 unused program-memory addresses
|
|
|
|
**Average cycle count: 1100**
|
|
|
|
|
|
### sort3.uc
|
|
|
|
A third iteration of the *bsrt* implementation: this time entirely scrapping
|
|
the bucket header, opting to delegate this behaviour to the lookup-table itself.
|
|
I.e. the lookup table no long points to the start of a bucket (as this can be
|
|
inferred later), but rather points to the last element of the bucket.
|
|
Additionally, this implementation scraps the insertion-sort performed in the
|
|
bucket sort, delegating this task to the merge stage of the sorting algorithm.
|
|
|
|
Pros:
|
|
* Constant-time bucket sort (406 cycles)
|
|
* More optimally used LUT
|
|
* No bookkeeping during bucket-sort
|
|
|
|
Cons:
|
|
* Non-constant-time merge
|
|
* Optimized LUT requires marginally more arithmetic operations over other variants
|
|
|
|
**Average cycle count: N/A (not fully implemented)**
|
|
|
|
|
|
### bsrt2.uc
|
|
|
|
A fourth iteration developed in parallel with *sort3* as a proof-of-concept of
|
|
the recently designed lookup-table replacement for hashing. This version is an
|
|
almost direct copy of *bsrt*, except without a LUT, a K1 jump-table and absolute
|
|
sizes in the bucket headers (as opposed to a pointer to the last element of a
|
|
bucket).
|
|
|
|
Pros:
|
|
* Highly optimized K1 jump-table
|
|
* Subroutined jump-table for possible reuse elsewhere
|
|
* 13 elements per bucket (+1 over *bsrt*)
|
|
* Low instruction count
|
|
|
|
Cons:
|
|
* Only sorts one element per iteration
|
|
|
|
**Average cycle count: 1050**
|
|
|
|
|
|
### sort4.uc
|
|
|
|
A fifth iteration of the common bucketsort algorithm. This one is, as its name
|
|
implies, based on the *sort2* algorithm. It improves upon it by making heavy
|
|
use of the `call` and `ret` instructions. I.e. by moving the entire bucketsort
|
|
implementation to a subroutine, it effectively allows 16 calls to sort values
|
|
per iteration of the outermost loop. This conversion to a subroutine is done at
|
|
zero cost to performance, as it exploits the fact that a call to a jumptable-
|
|
based hashing algorithm is made and rather than returning to the algorithm
|
|
after the hashing has taken place, the hash-table simply jumps to the
|
|
insertionsort immediately, after which a `ret` is used to return back to normal
|
|
execution. This has the effect of reducing a bucketsort of a single value be
|
|
two instructions to dereference the value to sort, during which a call to the
|
|
subroutine is made and after sorting, it continues execution at the instruction
|
|
immediately after the dereference.
|
|
|
|
Pros:
|
|
* Optimized K1 jump-table with inline bucketsort
|
|
* Highly un-rollable bucketsort implementation
|
|
* Highly efficient bus use (almost always saturated)
|
|
* Dynamic bucket placement, allowing for very fast merge operations
|
|
* Efficient use of general registers to reduce arithmetic operations requiring
|
|
constant values
|
|
|
|
Cons:
|
|
* Maximum of 6 elements per bucket
|
|
* 96 unused program-memory addresses (+1 per bucket)
|
|
|
|
**Average cycle count: 758**
|