Architecture and Programming of x86 Processors - Microprocessor ...

1MB Size 12 Downloads 56 Views

16-bit processors: The first processor in the IA (Intel Architecture) family was the ... x86 is the generic name for Intel processors released after the original 8086.
Brno University of Technology

Architecture and Programming of x86 Processors

Microprocessor Techniques and Embedded Systems Lecture 12

Dr. Tomas Fryza December 2012

Contents

A little bit of one-core Intel processors history

IA-32 processor registers

IA-32 processor programming in assembly language

Contents

A little bit of one-core Intel processors history

IA-32 processor registers

IA-32 processor programming in assembly language

History of Intel x86 processors

I

4-bit processor 4004; 8-bit processors 8080, 8085.

I

16-bit processors: The first processor in the IA (Intel Architecture) family was the 8086, introduced in 1979: 20-bit address bus, 16-bit data bus. It is available as 40-pin Dual-Inline-Package (DIP). It is available in three versions: 8086 (5 MHz), 8086-2 (8 MHz), and 8086-1 (10 MHz). It consists of 29,000 transistors.

I

8088 is identical to 8086, except it has 8-bit data bus. Intel introduced segmentation with 8086 and 8088 (real-mode segmentation). These processors can address up to four segments of 64 KB each.

I

x86 is the generic name for Intel processors released after the original 8086 processor.

I

Since Intel’s x86 processors are backwards compatible, newer x86 processors can run all the programs that older processors could run. However, older processors may not be able to run software that has been optimized for newer x86 processors.

I

80186 is a faster version of the 8086. Also 20-bit AB and 16-bit DB. Never widely used in computer systems.

I

80286 introduced in 1982. It has a 24-bit AB, which implies 16 MB of memory address space (224 = 16,777,216). DB is still 16 bit wide. It introduces protection mode (some memory protection capabilities).

8086 memory segmentation I

Segment Registers: additional registers called segment registers generate memory address when combined with other in the microprocessor. In 8086 microprocessor, memory is divided into 4 segments as follow: I I

I I

Code Segment (CS): The CS register is used for addressing a memory location in the Code Segment of the memory, where the executable program is stored. Data Segment (DS): The DS contains most data used by program. Data are accessed in the Data Segment by an offset address or the content of other register that holds the offset address. Stack Segment (SS): SS defined the area of memory used for the stack. Extra Segment (ES): ES is additional data segment that is used by some of the string to hold the destination data.

Figure: Memory segments of 8086.

Intel 80186 architecture

Figure: Architecture of Intel 80186 processor.

Intel’s 32-bit processors in computer systems I

I

I

Intel introduced its first 32-bit processor–the 80386–in 1985. It has 32-bit AB and 32-bit DB. It follows their 32-bit architecture known as IA-32. The memory address space has grown from 16 MB to 4 GB (232 = 4.2950e+09). Intel introduced paging into the IA architecture. It also allowed definition of segments as large as 4 GB. This effectively allowed for a flat model (i.e. effectively turning off segmentation). The Intel 80486 processor was introduced in 1989. This is an improved version of the 80386 (same AD, DB), but it combined the coprocessor functions for performing floating-point arithmetic. 80486 has added more parallel execution capability to instruction decode and execution units to achieve a scalar execution rate of one instruction per clock. It has 8 KB onchip L1 cache, it supports L2 cache, and multiprocessing. Pentium (name 80586 is not used, because it is not trademarked) was introduced in 1993-03-22 (20th anniversary). Similar to 80486 but uses a 64-bit wide DB. Internally, it has 128- and 256-bit wide datapaths to speed up internal data transfers. It has added a second execution pipeline to achieve superscalar performance by having the capability to execute two instructions per clock (the first superscalar x86 processor). Doubled onchip L1 cache: 8 KB for data, 8 KB for instructions (branch prediction added). Produced using a 0.8 micron (800 nm) production process, the first Pentium chips were built from 3.1 million transistors (compared to Core i7 chips, which have 1.4 billion transistors and are fabricated using a 22 nm process). The first Pentium chips were also introduced in 60 MHz and 66 MHz versions. iComp benchmark scores rating the 66 MHz Pentium at 565, compared with 297 for the 66 MHz 486DX2, which was the fastest chip available prior to the Pentium launch.

Intel Pentium (80586) architecture

Figure: Architecture of Intel Pentium processor.

Intel Pentium Pro I

The Pentium Pro was introduced in November 1995 as Intel’s 6th generation x86 design–code-named the P6. P6 has a three-way superscalar architecture (3 insts. per clock cycle). AD has been extended to 36 bits (address space 236 = 6.8719e+10, i.e. 64 GB). In addition to the L1 caches provided by the Pentium, the Pent. Pro has a 256 KB L2 cache in the same package as the CPU.

I

Powerful, but expensive.

(a)

(b)

Figure: Intel Pentium Pro: (a) package, (b) CPU and L2 cache die.

Another Pentiums . . . I

The Pentium II processor was introduced in May 1997 and it has added multimedia (MMX) instructions to the Pentium Pro architecture. L1D and L1P caches have been extended to 16 KB each. It has also added more comprehensive power management features including Sleep and Deep Sleep modes to conserve power during idle times. The Pentium II abandoned the socket approach to microprocessors, and introduced the slot concept. Containing 7.5 million transistors (the first P6-generation core of the Pentium Pro contained 5.5 million transistors). However, its L2 cache subsystem was a downgrade when compared to Pentium Pros.

(a)

(b)

Figure: (a) Intel Pentium II Deschutes; CPU Core in the middle, cache on the right, (b) mobile version of Pentium II Tonga.

Another Pentiums . . . I

The Pentium III processor (Feb 1999) introduced streaming SIMD extensions (SSE), cache prefetch instructions, and memory fences, and the single-instruction multiple-data (SIMD) architecture for concurrent execution of multiple floating-point operations. Pentium 4 enhanced these features further. I I I I

Code name: Katmai, 250 nm, May 1999 Coppermine, 180 nm, Mar 2000 (Remq.: The Pentium III Coppermine was the first commercial x86 processor from Intel to attain a clock speed of 1 GHz) Coppermine T, 180 nm, Aug 2000 Tualatin, 130 nm, Apr 2001

(a)

(b)

Figure: Intel Pentium III: (a) standard logo, (b) code name Coppermine.

64-bit processor was born I

Intel’s 64-bit Itanium processor (released in 2001; formerly called IA-64) is targeted for server applications and high-performance computing systems. The Itanium uses a 64-bit AB to provide substantially larger address space. Its DB is 128 bits wide. In a major departure, Intel has moved from the CISC designs used in their 32-bit processors to RISC orientation for their 64-bit Itanium processors. I I I

I

Each 128-bit instruction word contains three instructions, and the fetch mechanism can read up to two instruction words per clock from the L1 cache into the pipeline. When the compiler can take maximum advantage of this, the processor can execute six instructions per clock cycle. The processor has thirty functional execution units (6 general-purpose ALUs, 2 integer units, 1 shift unit, 6 data cache units, 6 multimedia units, 2 parallel shift units, 1 parallel multiply, 1 population count, 2 82-bit floating-point multiply-accumulate units, 2 SIMD floating-point multiply-accumulate units (two 32-bit operations each), 3 branch units) in eleven groups. Each unit can execute a particular subset of the instruction set, and each unit executes at a rate of one instruction per cycle unless execution stalls waiting for data. While not all units in a group execute identical subsets of the instruction set, common instructions can be executed in multiple units.

(a)

(b)

Figure: Intel Itanium: (a) modified logo from 2009, (b) Itanium 2 McKinley.

Intel Itanium architecture

Figure: Architecture of Intel Itanium processor.

Contents

A little bit of one-core Intel processors history

IA-32 processor registers

IA-32 processor programming in assembly language

Processor registers

I

The IA-32 architecture provides ten 32-bit and six 16-bit registers. These registers are grouped into general, control, and segment registers.

I

The general registers are further divided into data, pointer, and index registers.

Figure: IA-32 data registers.

I

There are four 32-bit data registers that can be used for arithmetic, logical, and other operations: I I I

Four 32-bit registers (EAX–accumulator, EBX–base, ECX–counter, EDX–data); or Four 16-bit registers (AX, BX, CX, DX); or Eight 8-bit registers (AH, AL, BH, BL, CH, CL, DH, DL).

Data, pointer, and index registers

(a)

(b)

Figure: IA-32 general registers: (a) data registers, (b) pointer and index registers.

I

Some registers have special functions when executing specific instructions. For example, when performing a multiplication operation, one of the two operands should be in the EAX, AX, or AL register depending on the operand size. Similary, the ECX or CX register is assumed to contain the loop count value for iterative instructions.

I

The two index registers (ESI, EDI) play a special role in the string processing instructions, but can be used as general-purpose data registers as well.

I

The pointer registers are mainly used to maintain the stack. Even though they can be used as general-purpose data registers, they are almost exclusively used for maintaining the stack.

Move operation examples

(a)

(b)

Figure: IA-32 general registers: (a) data registers, (b) pointer and index registers.

Table: MOV and its operands.

Machine instruction

Destination operand

Source operand

Operand notes

MOV MOV MOV MOV MOV

EAX, EBX, BX, DL, [EBL],

42h EDI CX BH EDI

MOV

EDX,

[ESI]

Source in immediate data Both are 32-bit register data Both are 16-bit register data Both are 8-bit register data Destination is 32-bit memory data at the address stored in ebp Source is 32-bit memory data at the address stored in esi

Control registers I

I

I

I

There are two 32-bit control registers: the instruction pointer register (EIP, or IP) and the flags register (EFLAGS, or FLAGS). The processor uses the instruction pointer register to keep track of the location of the next instruction to be executed (sometimes called the program counter register). The IP register is used for 16-bit addresses and the EIP register for 32-bit addresses. When an instruction is fetched from memory, the instruction pointer is updated to point to the next instruction. This register is also modified during the execution of an instruction that transfers control to another location in the program (such as a jump, procedure call, or interrupt). The FLAGS register is useful in executing 8086 processor code. The EFLAGS register consists of 6 status flags, 1 control flag, and 10 system flags.

Figure: Flags control register EFLAGS.

Segment registers I

There are six 16-bit segment registers: CS DS SS ES FS GS

Code segment Data segment Stack segment Extra segment Extra segment Extra segment

Figure: The six segment registers support the segmented memory architecture.

I

In segmented memory organization, memory is partioned into segments, where each segment is a small part of memory. The processor, at any time, can only access up to six segments of the main memory. The six segment registers point to where these segments are located in the memory.

I

A program is logically divided into two parts: a code part that contains only the instructions, and a data part that keeps only the data. The code segment (CS) register points to where the program’s instructions are stored in the main memory, and the data segment (DS) register points to the data part of the program. The stack segment (SS) register points to the program’s stack segment.

I

The last three segment registers–ES, FS, and GS–are additional segment registers that can be used in a similar way as the other segment registers.

Segmentation models I

I

The segments can span the entire memory address space. As a result, we can effectively make the segmentation invisible by mapping all segment base addresses to zero and setting the size to 4 GB. Such a model is called a flat model and is used in programming environments such as UNIX and Linux. Another model that uses the capabilities of segmentation to the full extent is the multisegment model.

Figure: Segments in a multisegment model.

Flat segmentation models

(a)

(b)

Figure: Flat segmentation models: (a) basic, (b) protected.

Contents

A little bit of one-core Intel processors history

IA-32 processor registers

IA-32 processor programming in assembly language

Assembly language programming I

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

1 2 3 4

Several assemblers: NASM, YASM, MASM, . . .

section . d a t a section . t e x t

; FILENAME : s a n d b o x . a s m

global _start _start : nop ; Put y o u r e x p e r i m e n t s b e t w e e n t h e two n o p s . . . mov edx , 'WXYZ ' ; 32− b i t move [email protected]@57h mov ax , 067 FEh ; 16− b i t move mov bx , ax ; 16− b i t move mov c l , bh ; 8− b i t move mov ch , b l ; 8− b i t move xchg c l , ch ; e x c h a n g e v a l u e s c l <−>ch ; Put y o u r e x p e r i m e n t s b e t w e e n t h e two n o p s . . . nop section . b s s

sandbox : sandbox . o ld −o sandbox sandbox . o sandbox . o : sandbox . asm nasm −f elf −g −F stabs sandbox . asm −l sandbox . lst I

Makefile example:

nasm -f elf -g -F stabs -l

invokes the assembler specifies that the .o file will be generated in the elf format specifies that debug information is to be included in the .o file specifies that debug information is to be generated in the stabs format listing file will be generated

Debugging tools: KDbg, gdb, . . .

(a)

(b)

Figure: Debugging example application in KDbg: (a) main window, (b) register values.

Comments