























| • Compulsory (cold start or p                | process migration, first reference): first access            |
|----------------------------------------------|--------------------------------------------------------------|
| to a block                                   |                                                              |
| $^\circ$ "Cold" fact of life: not a whole le | ot you can do about it                                       |
| $^\circ$ Note: If you are going to run "bi   | illions" of instruction, Compulsory Misses are insignificant |
| • Capacity:                                  |                                                              |
| $^\circ$ Cache cannot contain all blocks     | accessed by the program                                      |
| $^{\circ}$ Solution: increase cache size     |                                                              |
| • Conflict (collision):                      |                                                              |
| ° Multiple memory locations ma               | apped to the same cache location                             |
| $^\circ$ Solution 1: increase cache size     |                                                              |
| ° Solution 2: increase associativity         | У                                                            |
| • Coherence (invalidation): c                | other process (e.g., I/O) updates memory                     |

# Q2: How is a block found if it is in the upper level?

- Block is minimum quantum of caching
  - Data select field used to select data within block
- Index Used to Lookup Candidates in Cache
  - Index identifies the set
- How do we know which particular block is stored in a cache location?
  - Store block address as well as the data
  - Actually, only need the high-order bits
  - Called the tag
  - If no candidates match, then declare cache miss
- What if there is no data in a location?
  - Valid bit: 1 = present, 0 = not present
  - Initially 0







# **Review: Fully Associative Cache**

- Fully Associative: Every cache entry can hold/store any block/line
  - ° Address does not include a cache index
  - $^\circ\,$  Compare Cache Tags of all Cache Entries in Parallel
- Example: Block Size = 32 B blocks
  - ° We need N 27-bit comparators
  - $^\circ\,$  Still have byte select to choose from within block



# Q3: Which block should be replaced on a miss?

Easy for Direct Mapped

# • Set Associative or Fully Associative:

- LRU (Least Recently Used): Appealing, but hard to implement for high associativity
- Random: Easy, but how well does it work?
  - Miss rates:

| Assoc: | 2-way |       | 4-way |       | 8-way |       |
|--------|-------|-------|-------|-------|-------|-------|
| Size   | LRU   | Ran   | LRU   | Ran   | LRU   | Ran   |
| 16K    | 5.2%  | 5.7%  | 4.7%  | 5.3%  | 4.4%  | 5.0%  |
| 64K    | 1.9%  | 2.0%  | 1.5%  | 1.7%  | 1.4%  | 1.5%  |
| 256K   | 1.15% | 1.17% | 1.13% | 1.13% | 1.12% | 1.12% |

19

20

19

# **Q4:** What happens on a Write?

|                                                  | Write-Through                                                        | Write-Back                                                                                                                         |  |
|--------------------------------------------------|----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|--|
| Policy                                           | Data written to cache block<br>also written to lower-level<br>memory | <ul> <li>Write data only to<br/>the cache block</li> <li>Update lower level<br/>when a block falls<br/>out of the cache</li> </ul> |  |
| Debug                                            | Easy                                                                 | Hard                                                                                                                               |  |
| Do read misses<br>produce writes?                | No                                                                   | Yes                                                                                                                                |  |
| Do repeated writes<br>make it to lower<br>level? | Yes                                                                  | No                                                                                                                                 |  |















# Memory Mapping or Address Translation: Virtual Address to Physical Address In virtual memory, blocks of memory (called pages) are mapped from one set of addresses (called virtual addresses) to another set (called physical addresses).

- The processor generates virtual addresses while the memory is accessed using physical addresses. Both the virtual memory and the physical memory are broken into pages, so that a virtual page is mapped to a physical page.
- It is possible for a virtual page to be absent from main memory and not be mapped to a physical address; in that case, the page resides on disk.
- Physical pages can be shared by having two virtual addresses point to the same physical address. This capability is used to allow two different programs to share data or code.



















# **Three Advantages of Virtual Memory**

## 1. Translation:

- $^{\circ}\,$  Program can be given consistent view of memory, even though physical memory is scrambled
- ° Makes multithreading reasonable
- ° Only the most important part of program ("Working Set") must be in physical memory.
- ° Contiguous structures (like stacks) use only as much physical memory as necessary yet still grow later.

### 2. Protection:

- ° Different threads (or processes) protected from each other.
- ° Different pages can be given special behavior
  - (Read Only, Invisible to user programs, etc.)
- $^{\circ}\,$  Kernel data protected from User programs
- ° Very important for protection from malicious programs

### 3. Sharing:

 Can map same physical page to multiple users (i.e., processes or programs) ("Shared memory")





| Summary #2/3: Caches                                                                                                                                                                                                                                                                                                             |     |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| <ul> <li>The Principle of Locality:</li> <li><sup>o</sup> Program access a relatively small portion of the address space at any instant of time.</li> <li><sup>o</sup> Temporal Locality: Locality in Time</li> <li><sup>o</sup> Spatial Locality: Locality in Space</li> </ul>                                                  |     |
| <ul> <li>Three Major Categories of Cache Misses:         <ul> <li>Compulsory Misses: sad facts of life. Example: cold start misses.</li> <li>Capacity Misses: increase cache size</li> <li>Conflict Misses: increase cache size and/or associativity.</li> </ul> </li> <li>Write Policy: Write Through vs. Write Back</li> </ul> |     |
| <ul> <li>Today CPU time is a function of (ops, cache misses) v<br/>just of (ops): affects Compilers, Data structures, and<br/>Algorithms</li> </ul>                                                                                                                                                                              | /S. |
|                                                                                                                                                                                                                                                                                                                                  | 39  |

# Summary #3/3: Virtual Meenory (VM) Page tables map virtual address to physical address TLBs are important for fast translation TLB misses are significant in processor performance Caches, TLBs, Virtual Memory all understood by examining how they deal with 4 questions: Where can block be placed? Where can block be placed on miss? What block is replaced on miss? Today VM allows many processes to share single memory without having to swap all processes to disk; today VM protection is paramount!