Real-time dispatcher

The Dispatcher is the lowest level of the scheduler. It has the task of moving the complete state of the current task to memory and fetching/restoring the state of the new-to-run task from memory. A good designed dispatcher should have its own idle-administration, because the introduction of an idle task ( as in some existing systems ) can introduce aditional delays.

Modern Processors have extensive support for task switching already implemeted in hardware. In all processors later than 386 hardware support for task-switching is implemented. The X86 family after the 386 uses Task State Segments (TSS) to support multitasking. The TSS descriptor points to a buffer which is 104 bytes (minimum) long. The whole context of the current task is saved to this buffer. This is done in very few CPU-cycles. Task switches occur during a jump, or a call to a TSS descriptor. The processor loads the values in the TSS into the respective registers and executes the task. The TSS's aren't reentrant. A special flag is used to indicate the busy-state: the Busy-Bit in the TSS selector is used for this. (for a detailed discussion on X86 hardware task switching see Appendix B). Use of these task management facilities for handling multitasking applications is of course optional. Multi-tasking can be handled in software, with each software defined task executed in the context of a single processor architecture task. But it should be evident that use of the hardware facilities is much faster and easier (btw Please don't understand this as a criticism on Linux or on how Linux uses the TSS's. I don't comment this. This is a fundamental description and since many people click on it, there must be a need for it.)

home

Dispatcher

The scheduler handles all the high-level handling of the task switching process. It decides the execution order ( priority ) in which processes should be selected by the low-level dispatcher. It handles all the counting and administration.

Example:

Often we have to manage non-sharable resources, pieces of code which can't be written reentrant or hardware devices, which are by nature non sharable, eg linked-list-management, hard-disks, etc. Say one task uses the hard-disk and another wants to read from it. We have to deactivate the second task. We do this by a semaphore.

home


Interprocess Communication (IPC)

 

Using semaphores to ensure Mutual Exclusion (mutex)

Mutual exclusion is used to prevent multiple access to non-shareable resources ( Critical Section ):

Program for Producer

Program for Consumer

Repeat indefinitely
produce item;
wait(space available);
wait(Buff. Manip.);
deposit item in buffer;
signal(Buff. Manip);
signal(item available);
end
Repeat indefinitely
wait (item available);
wait(Buff. Manip.);
extract item from buffer;
signal(Buff. Manip);
signal(space available);
consume item;
end

Implementing Wait and Signal

Wait and Signal are integral parts of the dispatcher and are therefore implemented in the (Micro-)Kernel (together with the dispatcher, delay management, linked list management and message system). They must be available to all processes.

Every semaphore has its own linked list, in which tasks which wait for this event are queued

Wait may result in making the current process inactive/unrunnable. In this case the Dispatcher must select another - not necessarily the next task.

Interrupts can make a process runnable by signalling a semaphore

Double linked lists should be preferred to single linked list. Fast link-first, link-last routines can thus be implemented.

Since the whole dispatcher with its associated routines is a very short piece of code which is better to formulate and better readable in Assembler - and thus maintainable - than in any high level language, it seems advisable to produce it from the very beginning in Assembler (as the sole exception to the rule).

 

Concrete implementation of above described problem:

1.On entering the hard-disk driver first a Wait is executed. If no task currently is using the hard-disk driver, the task passes the Wait with no delay. The semaphore which was initialised to 1 is decremented by the executed Wait, thus the next task which wants to use this routine is taken out of the linked list of currently active tasks and is deactivated by the dispatcher. On finishing the hard-disk routine the first task signals the semaphore, thus setting the second - waiting - task to the active state again - which means it is simply re-inserted into the linked list of active tasks.

2. It is crucial that the linked lists may never be in a inconsistent state. If we want to be on the safe side, we disable the interrupts while rearranging the list. But if we don't want to disable the interrupts alltogether(which should always be the very last resort!) we use a critical section (with topmost priority). Since the linked list management can't be written reentrant and a task switch while rearranging the list would bring us a corrupt list we have to encapsulate the linked list management in a critical section (depending on the implementation of the interrupt system some other precautions must be taken).
(It should be noted that this is only a way in systems in which the critical section handler does not make use of the linked list management - which is seldom the case. Most real-time systems have special provisions for this case, a dispatcher lock etc...)


Message System

Our message system comprises only the two functions SendMessage and WaitMessage. While SendMessage simply signals the message semaphore and inserts the message into the linked list of messages, WaitMessage does a Wait on the message semaphore and fetches the next message out of the linked list of messages.
Of course this is only one possible implementation of a message system. There are numerous others. The counting semaphore could be replaced by a simple message counter, multiple different send- or receivemessage routines could be implemented, eg PostMessage additionally to SendMessage, GetMessage additionally to Waitmessage etc...
The form and the content of the messages itself can be choosen according to your needs. Most widely known is the Windows message format.


Alternative approaches, misconceptions, things to pay attention to

 

home


Commemorate:

Semaphores are long known in railway-technique. A semaphore keeps a train from entering a block (which is simply the distance between two semaphores) which is currently occupied by another train. When this train has left the block the waiting train is signaled that it can now enter the block. In railway technique this serves to prevent the collision of two trains with differing speeds.



Appendix A.

Interrupts
Interrupt and exception handler routines can also be executed in a separate task. Here, an interrupt or exception causes a task switch to a handler task. The handler task is given its own address space and (optionally) can execute at a higher protection level than application programs or tasks. The switch to the handler task is accomplished with an implicit task call that references a task gate descriptor. The task gate provides access to the address space for the handler task. As part of the task switch, the processor saves complete state information for the interrupted program or task. Upon returning from the handler task, the state of the interrupted program or task is restored and execution continues.


Appendix B:

home

X86 TASK MANAGEMENT

6.1.2. Task State

The following items define the state of the currently executing task:

The task’s current execution space, defined by the segment selectors in the segment registers (CS, DS, SS, ES, FS, and GS).

The state of the general-purpose registers.

The state of the EFLAGS register.

The state of the EIP register.

The state of control register CR3.

The state of the task register.

The state of the LDTR register.

The I/O map base address and I/O map (contained in the TSS).

Stack pointers to the privilege 0, 1, and 2 stacks (contained in the TSS).

Link to previously executed task (contained in the TSS).

Prior to dispatching a task, all of these items are contained in the task’s TSS, except the state of the task register. Also, the complete contents of the LDTR register are not contained in the TSS, only the segment selector for the LDT.

 

6.1.3. Executing a Task

Software or the processor can dispatch a task for execution in one of the following ways:

A explicit call to a task with the CALL instruction.

A explicit jump to a task with the JMP instruction.

An implicit call (by the processor) to an interrupt-handler task.

An implicit call to an exception-handler task.

A return (initiated with an IRET instruction) when the NT flag in the EFLAGS register is set.

All of these methods of dispatching a task identify the task to be dispatched with a segment selector that points either to a task gate or the TSS for the task. When dispatching a task with a CALL or JMP instruction, the selector in the instruction may select either the TSS directly or a task gate that holds the selector for the TSS. When dispatching a task to handle an interrupt or exception, the IDT entry for the interrupt or exception must contain a task gate that holds the selector for the interrupt- or exception-handler TSS. When a task is dispatched for execution, a task switch automatically occurs between the currently running task and the dispatched task. During a task switch, the execution environment of the currently executing task (called the task’s state or context) is saved in its TSS and execu-tion of the task is suspended. The context for the dispatched task is then loaded into the processor and execution of that task begins with the instruction pointed to by the newly loaded EIP register. If the task has not been run since the system was last initialized, the EIP will point to the first instruction of the task’s code; otherwise, it will point to the next instruction after the last instruction that the task executed when it was last active. If the currently executing task (the calling task) called the task being dispatched (the called task), the TSS segment selector for the calling task is stored in the TSS of the called task to provide a link back to the calling task. For all Intel Architecture processors, tasks are not recursive. A task cannot call or jump to itself. Interrupts and exceptions can be handled with a task switch to a handler task. Here, the processor not only can perform a task switch to handle the interrupt or exception, but it can automatically switch back to the interrupted task upon returning from the interrupt- or exception-handler task. This mechanism can handle interrupts that occur during interrupt tasks. As part of a task switch, the processor can also switch to another LDT, allowing each task to have a different logical-to-physical address mapping for LDT-based segments. The page-directory base register (CR3) also is reloaded on a task switch, allowing each task to have its own set of page tables. These protection facilities help isolate tasks and prevent them from interfering with one another. If one or both of these protection mechanisms are not used, the processor provides no protection between tasks. This is true even with operating systems that use multiple privilege levels for protection. Here, a task running at privilege level 3 that uses the same LDT and page tables as other privilege-level-3 tasks can access code and corrupt data and the stack of other tasks.

Use of task management facilities for handling multitasking applications is optional. Multi-tasking can be handled in software, with each software defined task executed in the context of a single Intel Architecture task.

6.2. TASK MANAGEMENT DATA STRUCTURES

The processor defines five data structures for handling task-related activities:

Task-state segment (TSS).

Task-gate descriptor.

TSS descriptor.

Task register.

NT flag in the EFLAGS register.

When operating in protected mode, a TSS and TSS descriptor must be created for at least one task, and the segment selector for the TSS must be loaded into the task register (using the LTR instruction).

6.2.1. Task-State Segment (TSS)

The processor state information needed to restore a task is saved in a system segment called the task-state segment (TSS). (Compatibility with 16-bit Intel 286 processor tasks is provided by a different kind of TSS) The fields of a TSS are divided into two main categories: dynamic fields and static fields.

The processor updates the dynamic fields when a task is suspended during a task switch. The following are dynamic fields:

General-purpose register fields

State of the EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI registers prior to the task switch.

Segment selector fields

Segment selectors stored in the ES, CS, SS, DS, FS, and GS registers prior to the task switch.

EFLAGS register field

State of the EFAGS register prior to the task switch.

EIP (instruction pointer) field

State of the EIP register prior to the task switch.

Previous task link field

Contains the segment selector for the TSS of the previous task (updated on a task switch that was initiated by a call, interrupt, or exception). This field (which is sometimes called the back link field) permits a task switch back to the previous task to be initiated with an IRET instruction. The processor reads the static fields, but does not normally change them. These fields are set up when a task is created.

CR3 control register field

Contains the base physical address of the page directory to be used by the task.

Control register CR3 is also known as the page-directory base register (PDBR).

Privilege level-0, -1, and -2 stack pointer fields

These stack pointers consist of a logical address made up of the segment selector for the stack segment (SS0, SS1, and SS2) and an offset into the stack (ESP0, ESP1, and ESP2). Note that the values in these fields are static for a particular task; whereas, the SS and ESP values will change if stack switching occurs within the task.

T (debug trap) flag (byte 100, bit 0)

When set, the T flag causes the processor to raise a debug exception when a task switch to this task occurs (see Section 14.3.1.5., "Task-Switch Exception Condition").

I/O map base address field

Contains a 16-bit offset from the base of the TSS to the I/O permission bit map and interrupt redirection bitmap. When present, these maps are stored in the TSS at higher addresses. The I/O map base address points to the beginning of the I/O permission bit map and the end of the interrupt redirection bit map. See Chapter 9, Input/Output, in the Intel Architecture Software Developer’s Manual, Volume 1, for more information about the I/O permission bit map. See Section 15.3., "Interrupt and Exception Handling in Virtual-8086 Mode", for a detailed description of the interrupt redirection bit map.

If paging is used, care should be taken to avoid placing a page boundary within the part of the TSS that the processor reads during a task switch (the first 104 bytes). If a page boundary is placed within this part of the TSS, the pages on either side of the boundary must be present at the same time and contiguous in physical memory. The reason for this restriction is that when accessing a TSS during a task switch, the processor reads and writes into the first 104 bytes of each TSS from contiguous physical addresses beginning with the physical address of the first byte of the TSS. It may not perform address translations at a page boundary if one occurs within this area. So, after the TSS access begins, if a part of the 104 bytes is not both present and physically contiguous, the processor will access incorrect TSS information, without generating a page-fault exception. The reading of this incorrect information will generally lead to an unre-coverable exception later in the task switch process.

Also, if paging is used, the pages corresponding to the previous task’s TSS, the current task’s TSS, and the descriptor table entries for each should be marked as read/write. The task switch will be carried out faster if the pages containing these structures are also present in memory before the task switch is initiated.

6.2.2. TSS Descriptor

The TSS, like all other segments, is defined by a segment descriptor. Figure 6-3 shows the format of a TSS descriptor. TSS descriptors may only be placed in the GDT; they cannot be placed in an LDT or the IDT. An attempt to access a TSS using a segment selector with its TI flag set (which indicates the current LDT) causes a general-protection exception (#GP) to be generated. A general-protection exception is also generated if an attempt is made to load a segment selector for a TSS into a segment register. The busy flag (B) in the type field indicates whether the task is busy. A busy task is currently running or is suspended. A type field with a value of 1001B indicates an inactive task; a value of 1011B indicates a busy task. Tasks are not recursive. The processor uses the busy flag to detect an attempt to call a task whose execution has been interrupted. To insure that there is only one busy flag is associated with a task, each TSS should have only one TSS descriptor that points to it.

The base, limit, and DPL fields and the granularity and present flags have functions similar to their use in data-segment descriptors (see Section 3.4.3., "Segment Descriptors"). The limit field must have a value equal to or greater than 67H (for a 32-bit TSS), one byte less than the minimum size of a TSS. Attempting to switch to a task whose TSS descriptor has a limit less than 67H generates an invalid-TSS exception (#TS). A larger limit is required if an I/O permis-sion bit map is included in the TSS. An even larger limit would be required if the operating system stores additional data in the TSS. The processor does not check for a limit greater than 67H on a task switch; however, it does when accessing the I/O permission bit map or interrupt redirection bit map.

Any program or procedure with access to a TSS descriptor (that is, whose CPL is numerically equal to or less than the DPL of the TSS descriptor) can dispatch the task with a call or a jump. In most systems, the DPLs of TSS descriptors should be set to values less than 3, so that only privileged software can perform task switching. However, in multitasking applications, DPLs for some TSS descriptors can be set to 3 to allow task switching at the application (or user) privilege level.

6.2.3. Task Register

The task register holds the 16-bit segment selector and the entire segment descriptor (32-bit base address, 16-bit segment limit, and descriptor attributes) for the TSS of the current task. This information is copied from the TSS descriptor in the GDT for the current task.

The task register has both a visible part (that can be read and changed by software) and an invis-ible part (that is maintained by the processor and is inaccessible by software). The segment selector in the visible portion points to a TSS descriptor in the GDT. The processor uses the invisible portion of the task register to cache the segment descriptor for the TSS. Caching these values in a register makes execution of the task more efficient, because the processor does not need to fetch these values from memory to reference the TSS of the current task.

The LTR (load task register) and STR (store task register) instructions load and read the visible portion of the task register. The LTR instruction loads a segment selector (source operand) into the task register that points to a TSS descriptor in the GDT, and then loads the invisible portion of the task register with information from the TSS descriptor. This instruction is a privileged instruction that may be executed only when the CPL is 0. The LTR instruction generally is used during system initialization to put an initial value in the task register. Afterwards, the contents of the task register are changed implicitly when a task switch occurs. The STR (store task register) instruction stores the visible portion of the task register in a general-purpose register or memory. This instruction can be executed by code running at any privilege level, to identify the currently running task; however, it is normally used only by oper-ating system software.

On power up or reset of the processor, the segment selector and base address are set to the default value of 0 and the limit is set to FFFFH.

6.2.4. Task-Gate Descriptor

A task-gate descriptor provides an indirect, protected reference to a task. A task-gate descriptor can be placed in the GDT, an LDT, or the IDT.

The TSS segment selector field in a task-gate descriptor points to a TSS descriptor in the GDT. The RPL in this segment selector is not used. The DPL of a task-gate descriptor controls access to the TSS descriptor during a task switch. When a program or procedure makes a call or jump to a task through a task gate, the CPL and the RPL field of the gate selector pointing to the task gate must be less than or equal to the DPL of the task-gate descriptor. (Note that when a task gate is used, the DPL of the destination TSS descriptor is not used.)

A task can be accessed either through a task-gate descriptor or a TSS descriptor. Both of these structures are provided to satisfy the following needs:

A task gate in an LDT, a task gate in the GDT, and a task gate in the IDT can all point to the same task.

6.3. TASK SWITCHING

The processor transfers execution to another task in any of four cases:

The current program, task, or procedure executes a JMP or CALL instruction to a TSS descriptor in the GDT.

The current program, task, or procedure executes a JMP or CALL instruction to a task-gate descriptor in the GDT or the current LDT.

An interrupt or exception vector points to a task-gate descriptor in the IDT.

The current task executes an IRET when the NT flag in the EFLAGS register is set.

The JMP, CALL, and IRET instructions, as well as interrupts and exceptions, are all generalized mechanisms for redirecting a program. The referencing of a TSS descriptor or a task gate (when calling or jumping to a task) or the state of the NT flag (when executing an IRET instruction) determines whether a task switch occurs.

The processor performs the following operations when switching to a new task:

1. Obtains the TSS segment selector for the new task as the operand of the JMP or CALL instruction, from a task gate, or from the previous task link field (for a task switch initiated

with an IRET instruction).

2. Checks that the current (old) task is allowed to switch to the new task. Data-access privilege rules apply to JMP and CALL instructions. The CPL of the current (old) task and the RPL of the segment selector for the new task must be less than or equal to the DPL of the TSS descriptor or task gate being referenced. Exceptions, interrupts (except for interrupts generated by the INT n instruction), and the IRET instruction are permitted to switch tasks regardless of the DPL of the destination task-gate or TSS descriptor. For interrupts generated by the INT n instruction, the DPL is checked.

3. Checks that the TSS descriptor of the new task is marked present and has a valid limit (greater than or equal to 67H).

4. Checks that the new task is available (call, jump, exception, or interrupt) or busy (IRET return).

5. Checks that the current (old) TSS, new TSS, and all segment descriptors used in the task switch are paged into system memory.

6. If the task switch was initiated with a JMP or IRET instruction, the processor clears the busy (B) flag in the current (old) task’s TSS descriptor; if initiated with a CALL instruction, an exception, or an interrupt, the busy (B) flag is left set. (See Table 6-2.)

7. If the task switch was initiated with an IRET instruction, the processor clears the NT flag in a temporarily saved image of the EFLAGS register; if initiated with a CALL or JMP instruction, an exception, or an interrupt, the NT flag is left unchanged in the saved EFLAGS image.

8. Saves the state of the current (old) task in the current task’s TSS. The processor finds the base address of the current TSS in the task register and then copies the states of the following registers into the current TSS: all the general-purpose registers, segment selectors from the segment registers, the temporarily saved image of the EFLAGS register, and the instruction pointer register (EIP).

NOTE

At this point, if all checks and saves have been carried out successfully, the processor commits to the task switch. If an unrecoverable error occurs in steps 1 through 8, the processor does not complete the task switch and insures that the processor is returned to its state prior to the execution of the instruction that initiated the task switch. If an unrecoverable error occurs after the commit point (in steps 9 through 14), the processor completes the task switch (without performing additional access and segment availability checks) and generates the appropriate exception prior to beginning execution of the new task. If exceptions occur after the commit point, the exception handler must finish the task switch itself before allowing the processor to begin executing the task. See Chapter 5, "Interrupt 10—Invalid TSS Exception (#TS)", for more information about the affect of exceptions on a task when they occur after the commit point of a task switch.

9. If the task switch was initiated with a CALL instruction, an exception, or an interrupt, the processor sets the NT flag in the EFLAGS image stored in the new task’s TSS; if initiated with an IRET instruction, the processor restores the NT flag from the EFLAGS image stored on the stack. If initiated with a JMP instruction, the NT flag is left unchanged.

10. If the task switch was initiated with a CALL instruction, JMP instruction, an exception, or an interrupt, the processor sets the busy (B) flag in the new task’s TSS descriptor; if initiated with an IRET instruction, the busy (B) flag is left set.

11. Sets the TS flag in the control register CR0 image stored in the new task’s TSS.

12. Loads the task register with the segment selector and descriptor for the new task's TSS.

13. Loads the new task's state from its TSS into processor. Any errors associated with the loading and qualification of segment descriptors in this step occur in the context of the new task. The task state information that is loaded here includes the LDTR register, the PDBR (control register CR3), the EFLAGS register, the EIP register, the general-purpose registers, and the segment descriptor parts of the segment registers.

14. Begins executing the new task. (To an exception handler, the first instruction of the new task appears not to have been executed.) The state of the currently executing task is always saved when a successful task switch occurs. If the task is resumed, execution starts with the instruction pointed to by the saved EIP value, and the registers are restored to the values they held when the task was suspended.

When switching tasks, the privilege level of the new task does not inherit its privilege level from the suspended task. The new task begins executing at the privilege level specified in the CPL field of the CS register, which is loaded from the TSS. Because tasks are isolated by their sepa-rate address spaces and TSSs and because privilege rules control access to a TSS, software does not need to perform explicit privilege checks on a task switch.

For the exception conditions that the processor checks for when switching tasks see manual. It also shows the exception that is generated for each check if an error is detected and the segment that the error code references. (The order of the checks in the table is the order used in the P6 family processors. The exact order is model specific and may be different for other Intel Architecture processors.) Exception handlers designed to handle these exceptions may be subject to recursive calls if they attempt to reload the segment selector that generated the exception. The cause of the exception (or the first of multiple causes) should be fixed before reloading the selector.

NOTES:

1. #NP is segment-not-present exception, #GP is general-protection exception, #TS is invalid-TSS excep-tion, and #SF is stack-fault exception.

2. The error code contains an index to the segment descriptor referenced in this column.

3. A segment selector is valid if it is in a compatible type of table (GDT or LDT), occupies an address within the table's segment limit, and refers to a compatible type of descriptor (for example, a segment selector in the CS register only is valid when it points to a code-segment descriptor).

The TS (task switched) flag in the control register CR0 is set every time a task switch occurs. System software uses the TS flag to coordinate the actions of floating-point unit when gener-ating floating-point exceptions with the rest of the processor. The TS flag indicates that the context of the floating-point unit may be different from that of the current task. See Section 2.5., "Control Registers", for a detailed description of the function and use of the TS flag.

6.4. TASK LINKING

The previous task link field of the TSS (sometimes called the "backlink") and the NT flag in the EFLAGS register are used to return execution to the previous task. The NT flag indicates whether the currently executing task is nested within the execution of another task, and the previous task link field of the current task's TSS holds the TSS selector for the higher-level task in the nesting hierarchy, if there is one.

When a CALL instruction, an interrupt, or an exception causes a task switch, the processor copies the segment selector for the current TSS into the previous task link field of the TSS for the new task, and then sets the NT flag in the EFLAGS register. The NT flag indicates that the previous task link field of the TSS has been loaded with a saved TSS segment selector. If soft-ware uses an IRET instruction to suspend the new task, the processor uses the value in the previous task link field and the NT flag to return to the previous task; that is, if the NT flag is set, the processor performs a task switch to the task specified in the previous task link field.


Appendix C:

home

7.1. LOCKED ATOMIC OPERATIONS

The 32-bit Intel Architecture processors support locked atomic operations on locations in

system memory. These operations are typically used to manage shared data structures (such as

semaphores, segment descriptors, system segments, or page tables) in which two or more

processors may try simultaneously to modify the same field or flag. The processor uses three

interdependent mechanisms for carrying out locked atomic operations:

Guaranteed atomic operations.

Bus locking, using the LOCK# signal and the LOCK instruction prefix.

Cache coherency protocols that insure that atomic operations can be carried out on cached

data structures (cache lock). This mechanism is present in the P6 family processors.

These mechanisms are interdependent in the following ways. Certain basic memory transactions

(such as reading or writing a byte in system memory) are always guaranteed to be handled atom-ically.

That is, once started, the processor guarantees that the operation will be completed before

another processor or bus agent is allowed access to the memory location. The processor also

supports bus locking for performing selected memory operations (such as a read-modify-write

operation in a shared area of memory) that typically need to be handled atomically, but are not

automatically handled this way. Because frequently used memory locations are often cached in

a processor’s L1 or L2 caches, atomic operations can often be carried out inside a processor’s

caches without asserting the bus lock. Here the processor’s cache coherency protocols insure

that other processors that are caching the same memory locations are managed properly while

atomic operations are performed on cached memory locations.

Note that the mechanisms for handling locked atomic operations have evolved as the complexity

of Intel Architecture processors has evolved. As such, more recent Intel Architecture processors

(such as the P6 family processors) provide a more refined locking mechanism than earlier Intel

Architecture processors, as is described in the following sections.

7.1.1. Guaranteed Atomic Operations

The Intel386, Intel486, Pentium, and P6 family processors guarantee that the following basic

memory operations will always be carried out atomically:

Reading or writing a byte.

Reading or writing a word aligned on a 16-bit boundary.

Reading or writing a doubleword aligned on a 32-bit boundary.

The P6 family processors guarantee that the following additional memory operations will

always be carried out atomically:

Reading or writing a quadword aligned on a 64-bit boundary. (This operation is also

guaranteed on the Pentium ® processor.)

16-bit accesses to uncached memory locations that fit within a 32-bit data bus.

16-, 32-, and 64-bit accesses to cached memory that fit within a 32-Byte cache line.

Accesses to cacheable memory that are split across bus widths, cache lines, and page boundaries

are not guaranteed to be atomic by the Intel486, Pentium, or P6 family processors. The P6 family

processors provide bus control signals that permit external memory subsystems to make split

accesses atomic; however, nonaligned data accesses will seriously impact the performance of

the processor and should be avoided where possible.

7.1.2. Bus Locking

Intel Architecture processors provide a LOCK# signal that is asserted automatically during

certain critical memory operations to lock the system bus. While this output signal is asserted,

requests from other processors or bus agents for control of the bus are blocked. Software can

specify other occasions when the LOCK semantics are to be followed by prepending the LOCK

prefix to an instruction.

In the case of the Intel386, Intel486, and Pentium processors, explicitly locked instructions will

result in the assertion of the LOCK# signal. It is the responsibility of the hardware designer to

make the LOCK# signal available in system hardware to control memory accesses among

processors.

For the P6 family processors, if the memory area being accessed is cached internally in the

processor, the LOCK# signal is generally not asserted; instead, locking is only applied to the

processor’s caches (see Section 7.1.4., "Effects of a LOCK Operation on Internal Processor

Caches").

7.1.2.1. AUTOMATIC LOCKING

The operations on which the processor automatically follows the LOCK semantics are as

follows:

When executing an XCHG instruction that references memory.

When setting the B (busy) flag of a TSS descriptor. The processor tests and sets the busy

flag in the type field of the TSS descriptor when switching to a task. To insure that two

processors do not switch to the same task simultaneously, the processor follows the LOCK

semantics while testing and setting this flag.

When updating segment descriptors. When loading a segment descriptor, the processor

will set the accessed flag in the segment descriptor if the flag is clear. During this

operation, the processor follows the LOCK semantics so that the descriptor will not be

modified by another processor while it is being updated. For this action to be effective,

operating-system procedures that update descriptors should use the following steps:

— Use a locked operation to modify the access-rights byte to indicate that the segment

descriptor is not-present, and specify a value for the type field that indicates that the

descriptor is being updated.

— Update the fields of the segment descriptor. (This operation may require several

memory accesses; therefore, locked operations cannot be used.)

— Use a locked operation to modify the access-rights byte to indicate that the segment

descriptor is valid and present.

Note that the Intel386™ processor always updates the accessed flag in the segment

descriptor, whether it is clear or not. The P6 family, Pentium ® , and Intel486™ processors

only update this flag if it is not already set.

When updating page-directory and page-table entries. When updating page-directory

and page-table entries, the processor uses locked cycles to set the accessed and dirty flag in

the page-directory and page-table entries.

Acknowledging interrupts. After an interrupt request, an interrupt controller may use the

data bus to send the interrupt vector for the interrupt to the processor. The processor

follows the LOCK semantics during this time to ensure that no other data appears on the

data bus when the interrupt vector is being transmitted.

7.1.2.2. SOFTWARE CONTROLLED BUS LOCKING

To explicitly force the LOCK semantics, software can use the LOCK prefix with the following

instructions when they are used to modify a memory location. An invalid-opcode exception

(#UD) is generated when the LOCK prefix is used with any other instruction or when no write

operation is made to memory (that is, when the destination operand is in a register).

The bit test and modify instructions (BTS, BTR, and BTC).

The exchange instructions (XADD, CMPXCHG, and CMPXCHG8B).

The LOCK prefix is automatically assumed for XCHG instruction.

The following single-operand arithmetic and logical instructions: INC, DEC, NOT, and

NEG.

The following two-operand arithmetic and logical instructions: ADD, ADC, SUB, SBB,

AND, OR, and XOR.

A locked instruction is guaranteed to lock only the area of memory defined by the destination

operand, but may be interpreted by the system as a lock for a larger memory area.

Software should access semaphores (shared memory used for signalling between multiple

processors) using identical addresses and operand lengths. For example, if one processor

accesses a semaphore using a word access, other processors should not access the semaphore

using a byte access.

The integrity of a bus lock is not affected by the alignment of the memory field. The LOCK

semantics are followed for as many bus cycles as necessary to update the entire operand.

However, it is recommend that locked accesses be aligned on their natural boundaries for better

system performance:

Any boundary for an 8-bit access (locked or otherwise).

16-bit boundary for locked word accesses.

32-bit boundary for locked doubleword access.

64-bit boundary for locked quadword access.

Locked operations are atomic with respect to all other memory operations and all externally

visible events. Only instruction fetch and page table accesses can pass locked instructions.

Locked instructions can be used to synchronize data written by one processor and read by

another processor.

For the P6 family processors, locked operations serialize all outstanding load and store opera-tions

(that is, wait for them to complete).

Locked instructions should not be used to insure that data written can be fetched as instructions.

NOTE

The locked instructions for the current versions of the Intel486, Pentium, and

P6 family processors will allow data written to be fetched as instructions.

However, Intel recommends that developers who require the use of self-modifying

code use a different synchronizing mechanism, described in the

following sections.


All other citations Intel Processor handbooks IA32 or X86

All trademarks respected

All trademarks property of their respective owners

home