Thursday, May 23, 2013

Instruction Set

The processor about to be on this web accesses its internal memory by 36-bit width consisting of 4-bit opcode and 32-bit data.

At fetching an instruction, the 32-bit data is retrieved along with the 4-bit opcode and defined as operand Y at instruction execution.


 opcode (4-bit)  operand Y (32-bit) 


The registers consist of program counter PC and 33-bit long accumulator Acc which includes 32-bit register X and 1-bit carry flag C.


 program counter PC


Accumlator Acc (33-bit):

 carry flag C  register X (32-bit)


FPGA (EP4CE22) on DE0-Nano board can provide 36bits x 16k words internal memory, enabling 14-bit program counter PC.

Among other instructions, locking flag (lock) for mutual exclusion keeps the internal state unique to each processor element.  The locking flag is set on by LOCK (or TRYLOCK) instruction and reset by UNLOCK instruction.

Setting the locking flag can be done only at the executing processor, while resetting is executable at any processor by specifying target processors with operand.

The below-attached table shows instruction codes and their functions as the changes in internal state.  Please be aware that the contents are temporary and may be changed along with adjustments in actual implementation.


opcode Y[31..28] instruction PC change Acc change other action
0000   GET, DATA PC := X X := Y  
0001 0000  JUMP PC := Y    
0001 0001 JUMPNZ if(X!=0) PC := Y
else PC := PC + 1
0001 0010 JUMPNP if(X<=0) PC := Y
else PC := PC + 1
0001 0011  JUMPM if(X<0) PC := Y
else PC := PC + 1
0001 0100 JUMPNM if(X>=0) PC := Y
else PC := PC + 1
0001 0101 JUMPP if(X>0) PC := Y
else PC := PC + 1
0001 0110 JUMPZ if(X==0) PC := Y
else PC := PC + 1
0001 0111 UNLOCK PC := PC + 1   unlock(Y)
0001 100- JUMPNC if(C==0) PC := Y
else PC := PC + 1
0001 101- JUMPC if(C==1) PC := Y
else PC := PC + 1
0001 110- TRYLOCK if(!lock) PC := Y
else PC := PC + 1
  lock := 1
0001 111- LOCK if(lock) PC := Y
else PC := PC + 1
0010 code PUT if(collision) PC := PC
else PC := PC + 1
  if(!collision) [Y] := code:X
0011   HALT PC := PC    
0100   MUL PC := PC + 1 X := Y * X  
0101   MULH X := (Y * X) >> 32  
0110   SHIFT if(Y<0) Acc := X>>Y
else Acc := X<<Y
0111   IO Acc := io_port(Y, Acc)  
1000   LOAD X := Y  
1001   OR X := Y or X  
1010   AND X := Y and X  
1011   XOR X := Y xor X  
1100   ADD Acc := Y + X  
1101   SUB Acc := Y — X  
1110   ADC Acc := Y + X + C  
1111   SBB Acc := Y — X — C  


Basically, the functions of instructions are determined by 4-bit opcode, while some instructions like conditional jumps use the upper 4 bits of operand Y for expanding instruction.

JUMP instruction is to change the value of PC according to conditions and after that, until the lower bits of PC accords to the value of memory bank, to execute HALT instruction internally in order to wait for rotation.

Writing back to memory by PUT instruction is actually delayed until the thread reaches the target memory bank.  In this case, if another writing instruction to a bank is issued when prior writing to this bank is not completed (collision=1), the instruction is once retried as PC := PC until all the delayed writing is completed in the waiting time of one cycle rotation and then executed again.  In this second execution, the writing instruction is surely completed as all other writing tasks are already done.

On rewriting operand by PUT instruction, the related opcode is also rewritten with the value of the upper 4 bits of Y.  This method is for such reasons as better efficiency of having opcode and operand in the same memory by using parity bits and more convenience of atomic rewriting particularly for exclusive operation.   On the other hand, it is also a bother in hand-coding to have to specify opcode even when only operand needs to be rewritten.  The instruction set released here is basically designed on the assumption that these tasks are to be automated by compiler or other tools.

The biggest thorn in designing here is random-access readout.  The processor employs DATA instruction as the minimum required to enable such functions as array and table.  DATA instruction assigns the value of operand Y to register X, just like LOAD instruction does, and simultaneously executes a jump to the address which is specified by register X prior to execution of the value assignment.  The basic way of use is to assign a return address to register X and execute a jump to a target DATA instruction, which achieves random access by changing destination address of the jump.   Though, it is totally inefficient to rewrite operand of every jump instruction.

Curiously enough, however, just using a same instruction can solve this thorny problem.

For example, here we place a DATA instruction with operand value 123 as data at the address 1000.  By the code from the address 100, assign 1000 to register X by LOAD 1000, and then, at the next address 101, execute a DATA instruction with operand 102, the next instruction address.  And then, as the result of executing the DATA instruction at the address 101, the value of register X becomes 102, and jump to the address 1000.  Subsequently, the DATA instruction at the address 1000 is executed and then the value of register X becomes 123, which triggers jumping to the address 102.  As the result, it looks as if the DATA instruction at the address 101 has got data at the address pointed by register X (X := [X]) and additionally worked as an instruction to jump to the address specified by operand (PC := Y).  Because these DATA instructions at the addresses 101 and 1000 execute the same action but have completely different meanings in the whole program, I’d like to assign a different name GET to the same opcode at the address 101.


PC instruction Y X
100 LOAD 1000 1000
101 DATA(GET) 102 102
1000 DATA 123 123
102 next …    


Random-access readout is very low-cost in terms of hardware because it only requires mounting a simple instruction.  On the other hand, it becomes costly at the point of execution due to waiting time for rotation, as it needs to have jump operation executed twice.  Particularly in the above-mentioned case, which has assigned the next address of the instruction to the return address of GET instruction, it inevitably costs waiting time for one cycle rotation.  If it is possible to place GET instruction, DATA instruction and return address in adjoining memory banks, it reduces waiting time for rotation as short as just for two instructions period.  Though it isn’t that simple in practice because there would be some GET instructions sharing access to the same DATA instruction, still well-designed arrangement of instructions and data could improve efficiency in instruction execution.  This kind of optimization technology could develop as a very interesting subject for study from now on.

Wednesday, May 22, 2013

To Introduce JRuby

In parallel with processor designing for DE0-Nano, I am developing environments related to JRuby.  JRuby is a Ruby environment workable on a Java virtual machine and can be so easily implemented just through a Jar file.  The language Ruby has a high degree of flexibility and enables structuring another language (domain-specific language) on itself.  Designing processor circuits requires the use of such hardware description languages as VHDL and Verilog HDL for logic synthesis.  But it’s a hard work to write down a large-scale circuit for multiprocessor directly without CAD.  Therefore, here I employ a method of automatically generating a VHDL code based on higher-level descriptions written by Ruby.  I am also planning to build compilers based on Ruby.

Having a call facility to DLL in the Windows environment, JRuby enables access to JTAG itself instead of JNA.  And so, here I am offering a JTAG demo program (jtag_demo.jar) based on JRuby.  As the Jar file of JRuby (jruby-complete.jar) is pretty large, the demo program does not include it but is to invoke automatic downloading the latest version through the internet.


Put the downloaded file (jruby-complete.jar) in the same folder of the demo program (jtag_demo.jar) or the Java extensions folder (jre/lib/ext), and next time on it will be workable without downloading.  This way facilitates distributing circuit data and configuration programs in a very compact file alone.  You can refer to the Java and Ruby source codes (to be precise, executable scripts of Ruby) included in the Jar file to use a similar mechanism.