relm.info - Register-less Multiprocessor Information

Thursday, May 23, 2013

Instruction Set

The processor about to be on this web accesses its internal memory by 36-bit width consisting of 4-bit opcode and 32-bit data.

At fetching an instruction, the 32-bit data is retrieved along with the 4-bit opcode and defined as operand Y at instruction execution.

opcode (4-bit) operand Y (32-bit)

The registers consist of program counter PC and 33-bit long accumulator Acc which includes 32-bit register X and 1-bit carry flag C.

program counter PC

Accumlator Acc (33-bit):

carry flag C register X (32-bit)

FPGA (EP4CE22) on DE0-Nano board can provide 36bits x 16k words internal memory, enabling 14-bit program counter PC.

Among other instructions, locking flag (lock) for mutual exclusion keeps the internal state unique to each processor element. The locking flag is set on by LOCK (or TRYLOCK) instruction and reset by UNLOCK instruction.

Setting the locking flag can be done only at the executing processor, while resetting is executable at any processor by specifying target processors with operand.

The below-attached table shows instruction codes and their functions as the changes in internal state. Please be aware that the contents are temporary and may be changed along with adjustments in actual implementation.

opcode	Y[31..28]	instruction	PC change	Acc change	other action
0000		GET, DATA	PC := X	X := Y
0001	0000	JUMP	PC := Y
0001	0001	JUMPNZ	if(X!=0) PC := Y else PC := PC + 1
0001	0010	JUMPNP	if(X<=0) PC := Y else PC := PC + 1
0001	0011	JUMPM	if(X<0) PC := Y else PC := PC + 1
0001	0100	JUMPNM	if(X>=0) PC := Y else PC := PC + 1
0001	0101	JUMPP	if(X>0) PC := Y else PC := PC + 1
0001	0110	JUMPZ	if(X==0) PC := Y else PC := PC + 1
0001	0111	UNLOCK	PC := PC + 1		unlock(Y)
0001	100-	JUMPNC	if(C==0) PC := Y else PC := PC + 1
0001	101-	JUMPC	if(C==1) PC := Y else PC := PC + 1
0001	110-	TRYLOCK	if(!lock) PC := Y else PC := PC + 1		lock := 1
0001	111-	LOCK	if(lock) PC := Y else PC := PC + 1		lock := 1
0010	code	PUT	if(collision) PC := PC else PC := PC + 1		if(!collision) [Y] := code:X
0011		HALT	PC := PC
0100		MUL	PC := PC + 1	X := Y * X
0101		MULH		X := (Y * X) >> 32
0110		SHIFT		if(Y<0) Acc := X>>Y else Acc := X<<Y
0111		IO		Acc := io_port(Y, Acc)
1000		LOAD		X := Y
1001		OR		X := Y or X
1010		AND		X := Y and X
1011		XOR		X := Y xor X
1100		ADD		Acc := Y + X
1101		SUB		Acc := Y — X
1110		ADC		Acc := Y + X + C
1111		SBB		Acc := Y — X — C

Basically, the functions of instructions are determined by 4-bit opcode, while some instructions like conditional jumps use the upper 4 bits of operand Y for expanding instruction.

JUMP instruction is to change the value of PC according to conditions and after that, until the lower bits of PC accords to the value of memory bank, to execute HALT instruction internally in order to wait for rotation.

Writing back to memory by PUT instruction is actually delayed until the thread reaches the target memory bank. In this case, if another writing instruction to a bank is issued when prior writing to this bank is not completed (collision=1), the instruction is once retried as PC := PC until all the delayed writing is completed in the waiting time of one cycle rotation and then executed again. In this second execution, the writing instruction is surely completed as all other writing tasks are already done.

On rewriting operand by PUT instruction, the related opcode is also rewritten with the value of the upper 4 bits of Y. This method is for such reasons as better efficiency of having opcode and operand in the same memory by using parity bits and more convenience of atomic rewriting particularly for exclusive operation. On the other hand, it is also a bother in hand-coding to have to specify opcode even when only operand needs to be rewritten. The instruction set released here is basically designed on the assumption that these tasks are to be automated by compiler or other tools.

The biggest thorn in designing here is random-access readout. The processor employs DATA instruction as the minimum required to enable such functions as array and table. DATA instruction assigns the value of operand Y to register X, just like LOAD instruction does, and simultaneously executes a jump to the address which is specified by register X prior to execution of the value assignment. The basic way of use is to assign a return address to register X and execute a jump to a target DATA instruction, which achieves random access by changing destination address of the jump. Though, it is totally inefficient to rewrite operand of every jump instruction.

Curiously enough, however, just using a same instruction can solve this thorny problem.

For example, here we place a DATA instruction with operand value 123 as data at the address 1000. By the code from the address 100, assign 1000 to register X by LOAD 1000, and then, at the next address 101, execute a DATA instruction with operand 102, the next instruction address. And then, as the result of executing the DATA instruction at the address 101, the value of register X becomes 102, and jump to the address 1000. Subsequently, the DATA instruction at the address 1000 is executed and then the value of register X becomes 123, which triggers jumping to the address 102. As the result, it looks as if the DATA instruction at the address 101 has got data at the address pointed by register X (X := [X]) and additionally worked as an instruction to jump to the address specified by operand (PC := Y). Because these DATA instructions at the addresses 101 and 1000 execute the same action but have completely different meanings in the whole program, I’d like to assign a different name GET to the same opcode at the address 101.

PC	instruction	Y	X
100	LOAD	1000	1000
101	DATA(GET)	102	102
1000	DATA	123	123
102	next …

Random-access readout is very low-cost in terms of hardware because it only requires mounting a simple instruction. On the other hand, it becomes costly at the point of execution due to waiting time for rotation, as it needs to have jump operation executed twice. Particularly in the above-mentioned case, which has assigned the next address of the instruction to the return address of GET instruction, it inevitably costs waiting time for one cycle rotation. If it is possible to place GET instruction, DATA instruction and return address in adjoining memory banks, it reduces waiting time for rotation as short as just for two instructions period. Though it isn’t that simple in practice because there would be some GET instructions sharing access to the same DATA instruction, still well-designed arrangement of instructions and data could improve efficiency in instruction execution. This kind of optimization technology could develop as a very interesting subject for study from now on.

Posted by tanuma at 14:40:00 in Register-less architecture
3 comments

Wednesday, May 22, 2013

To Introduce JRuby

In parallel with processor designing for DE0-Nano, I am developing environments related to JRuby. JRuby is a Ruby environment workable on a Java virtual machine and can be so easily implemented just through a Jar file. The language Ruby has a high degree of flexibility and enables structuring another language (domain-specific language) on itself. Designing processor circuits requires the use of such hardware description languages as VHDL and Verilog HDL for logic synthesis. But it’s a hard work to write down a large-scale circuit for multiprocessor directly without CAD. Therefore, here I employ a method of automatically generating a VHDL code based on higher-level descriptions written by Ruby. I am also planning to build compilers based on Ruby.

Having a call facility to DLL in the Windows environment, JRuby enables access to JTAG itself instead of JNA. And so, here I am offering a JTAG demo program (jtag_demo.jar) based on JRuby. As the Jar file of JRuby (jruby-complete.jar) is pretty large, the demo program does not include it but is to invoke automatic downloading the latest version through the internet.

Put the downloaded file (jruby-complete.jar) in the same folder of the demo program (jtag_demo.jar) or the Java extensions folder (jre/lib/ext), and next time on it will be workable without downloading. This way facilitates distributing circuit data and configuration programs in a very compact file alone. You can refer to the Java and Ruby source codes (to be precise, executable scripts of Ruby) included in the Jar file to use a similar mechanism.

Posted by tanuma at 18:15:22 in Related researches
5 comments

relm.info - Register-less Multiprocessor Information

Home

Blog

Contact

Login

Japanese

English

Thursday, May 23, 2013

Instruction Set

Wednesday, May 22, 2013

To Introduce JRuby

Categories

Archives

Last 10 entries

relm.info - Register-less Multiprocessor Information

Home Blog Contact Login Japanese English

Thursday, May 23, 2013

Instruction Set

Wednesday, May 22, 2013

To Introduce JRuby

Categories

Archives

Last 10 entries

Search

Home

Blog

Contact

Login

Japanese

English