Tuesday, January 22, 2013

Accumulator Machine & Self-modification

Multiprocessor system based on register-less architecture can be illustrated in the image below as if tasks run themselves while going round and chasing each other through processor elements.

smp_image.png

Inner state information of each task should be as simple and light as possible to move between processors, requiring employment of architecture minimizing register.  The most typical of such architecture is accumulator machine.  For example, accumulator machine executes such a form of instruction as shown below.

ADD memory_address

The actual processing this instruction does is to add the value of memory at location memory_address to the value of accumulator Acc, the only general-purpose register of the processor.

Acc <= Acc + (memory value at memory_address)

While being quite useful for downsizing circuit, accumulator machine is not suitable for speeding up because of increasing memory access.  To do “ADD memory_address,” for example, accumulator machine requires at least twice access to memory: once to read the ADD instruction and memory_address and the second time to read the value of memory at the memory_address.

Accumulator machine has been almost abandoned as being out of date against current development in processors to minimize memory access.  However, except this weak point, accumulator machine has preferable properties to be an element of small-scale multiprocessor.  Therefore, I am introducing a technique of equipping small-scale multiprocessor with improved accumulator machine.

If processor has register other than accumulator, reading register can replace once of memory access by specifying register number instead of memory address:

ADD register_number  {Acc <= Acc + (register value)}

Even so, reading register value is possible only after having register number fixed and therefore impossible at the same time of reading instruction.  Actually, an instruction is executed through at least four stages:

  1. Read instruction and register number
  2. Read register
  3. Do addition
  4. Write result

Most of current processors achieve high performance by simultaneously running these stages through pipelining.  In reality, however, this technique requires various kinds of efforts to perform well, causing current processors to be complicated.

After all, this kind of problem is inevitable, even by using register, unless parameters necessary to run instruction (in case of addition, both left-hand side and right-hand side) are fixed at the same time that the instruction is retrieved.  The only solution is to limit instruction parameters to fixed values (immediate/literal).  In ADD instruction, it goes like this:

ADD immediate  {Acc <= Acc + (immediate value)}

Processor needs to access memory only once, if it can read instruction and parameters at the same time.  Then it goes to addition instantly, and basically has no need for complicated pipeline processing.  Of course, programs only with fixed parameters make processor nothing but useless, however, this is solvable by self-modifying code.  That is, to be more specific, PUT instruction to replace parameter of instruction at the address indicated by parameter of the PUT instruction with current value of accumulator:

PUT instruction_address  {[instruction_address] <= Acc}

As the parameter of PUT instruction itself is rewritable by another PUT instruction, array and other indirect addressing are also available.  PUT instruction makes parameters of every instruction available as general-purpose registers in fact.

Self-modification of codes is a rarely-used technique in current processors except special cases.  Not only such issues like security and memory protection, but the biggest negative reason seems to be that code’s self-modifying process is quite inefficient in cache memory and instruction pipeline.  Register-less architecture spotlights and employs aggressively this out-of-the-mainstream technique of self-modification.

That is, register-less architecture is what revives non-mainstream techniques of accumulator machine and self-modifying as elements of small-scale multiprocessor.

To be continued…