# Section 8: Demand Paging

# CS 162

# October 23, 2020

# Contents

| 1 Vocabulary |                                                                                                                        |                                      |  |  |  |  |  |  |  |  |  |
|--------------|------------------------------------------------------------------------------------------------------------------------|--------------------------------------|--|--|--|--|--|--|--|--|--|
| 2            | Paging   2.1 Demand Paging   2.2 Cached Paging                                                                         | <b>4</b><br>4<br>5                   |  |  |  |  |  |  |  |  |  |
| 3            | Page Replacement Algorithms                                                                                            | 6                                    |  |  |  |  |  |  |  |  |  |
|              | 3.1 FIFO   3.2 LRU   3.3 MIN   3.4 FIFO vs. LRU   3.5 Improving Cache Performance   3.5.1 FIFO   3.5.2 LRU   3.5.3 MIN | 6<br>6<br>7<br>7<br>7<br>7<br>7<br>7 |  |  |  |  |  |  |  |  |  |
| 4            | Clock Algorithm   4.1 Clock Page Table Entry   4.2 Clock Algorithm Step-through                                        | <b>8</b><br>8<br>8                   |  |  |  |  |  |  |  |  |  |

#### 1 Vocabulary

- **Demand Paging** The process where the operating system only stores pages that are "in demand" in the main memory and stores the rest in persistent storage (disk). Accesses to pages not currently in memory page fault and the page fault handler will retrieve the request page from disk (paged in). When main memory is full, then as new pages are paged in old pages must be paged out through a process called eviction. Many cache eviction algorithms like least recently used can be applied to demand paging, the main memory is acting as the cache for pages which all start on disk.
- Working Set The subset of the address space that a process uses as it executes. Generally we can say that as the cache hit rate increases, more of the working set is being added to the cache.
- **Resident Set Size** The portion of memory occupied by a process that is held in main memory (RAM). The rest has been paged out onto disk through demand paging.
- **Thrashing** Phenomenon that occurs when a computer's virtual memory subsystem is constantly paging (exchanging data in memory for data on disk). This can lead to significant application slowdown.
- Inverted Page Table The inverted page table scheme uses a page table that contains an entry for each physical frame, not for each logical page. This ensures that the table occupies a fixed fraction of memory. The size is proportional to physical memory, not the virtual address space. The inverted page table is a global structure there is only one in the entire system. It stores reverse mappings for all processes. Each entry in the inverted table contains has a tag containing the task id and the virtual address for each page. These mappings are usually stored in associative memory (remember fully associative caches from 61C?). Associatively addressed memory compares input search data (tag) against a table of stored data, and returns the address of matching data. They can also use actual hash maps.
- **Policy Misses** The miss that occurs when pages were previously in memory but were selected to be paged out because of the replacement policy.
- **Random** Random: Pick a random page for every replacement. Unpredictable and hard to make any guarantees. TLBs are typically implemented with this policy.
- **FIFO** First In, First Out: Selects the oldest page to be replaced. It is fair, but suboptimal because it throws out heavily used pages instead of infrequently used pages.
- MIN Minimum: Replace the page that won't be used for the longest time. Provably optimal. To approximate MIN, take advantage of the fact that the past is a good predictor of the future (see LRU).
- LRU Least Recently Used: Replace the page which hasn't been used for the longest time. An approximation of MIN. Not actually implemented in reality because it's expensive; see Clock
- Belady's Anomaly The phenomenon in which increasing the number of page frames results in an increase in the number of page faults for a given memory access pattern. This is common for FIFO, Second Chance, and the random page replacement algorithm. For more information, check out http://nob.cs.ucdavis.edu/classes/ecs150-2008-02/handouts/memory/mm-belady.pdf
- **Clock** Clock Algorithm: An approximation of LRU. Main idea: replace *an* old page, not the *oldest* page. On a page fault, check the page currently pointed to by the 'clock hand. Checks a use bit which indicates whether a page has been used recently; clears it if it is set and advances the clock hand. Otherwise, if the use bit is 0, selects this candidate for replacement.

Other bits used for Clock: "modified"/"dirty" indicates whether page must be written back to disk upon pageout; "valid" indicates whether the program is allowed to reference this page; "read-only"/"writable" indicates whether the program is allowed to modify this page.

- Nth Chance Nth Chance Algorithm: An approximation of LRU. A version of Clock Algorithm where each page gets N chances before being selected for replacement. The clock hand must sweep by N times without the page being used before the page is replaced. For a large N, this is a very good approximation of LRU.
- Second Chance List Second-Chance List Algorithm: An approximation of LRU. Divides pages into two an active list and a second-chance list. The active list uses a replacement policy of FIFO, while the second-chance list uses a replacement policy of LRU. Not required reading, but if you're interested in the details, this algorithm is covered in detail in this paper: https://users.soe.ucsc.edu/~sbrandt/221/Papers/Memory/levy-computer82.pdf. The version presented in lecture and for the purposes of this course includes some significant simplifications.

## 2 Paging

#### 2.1 Demand Paging

An up-and-coming big data startup has just hired you do help design their new memory system for a byte-addressable system. Suppose the virtual and physical memory address space is 32 bits with a 4KB page size.

Suppose you know that there will only be 4 processes running at the same time, each with a Resident Set Size (RSS) of 512MB and a working set size of 256KB. What is the minimum amount of TLB entries that your system would need to support to be able to map/cache the working set size for one process? What happens if you have more entries? What about less?

Suppose you run some benchmarks on the system and you see that the system is utilizing over 99% of its paging disk IO capacity, but only 10% of its CPU. What is a combination of the of disk space and memory size that can cause this to occur? Assume you have TLB entries equal to the answer from the previous part.

Out of increasing the size of the TLB, adding more disk space, and adding more memory, which one would lead to the largest performance increase and why?

#### 2.2 Cached Paging

Consider a machine with a page size of 1024 bytes. There are 8KB of physical memory and 8KB of virtual memory. The TLB is a fully associative cache with space for 4 entries that is currently empty. Assume that the physical page number is always one more than the virtual page number. This is a sequence of memory address accesses for a program we are writing: 0x294, 0xA76, 0x5A4, 0x923, 0xCFF, 0xA12, 0xF9F, 0x392, 0x341.

Here is the current state of the page table.

| Valid Bit | Physical Page Number |
|-----------|----------------------|
| 0         | NULL                 |
| 1         | 2                    |
| 0         | NULL                 |
| 0         | 4                    |
| 0         | 5                    |
| 1         | 6                    |
| 1         | 7                    |
| 0         | NULL                 |

Explain what happens on a memory access.

How many TLB hits and page faults are there? What are the contents of the cache at the end of the sequence?

# 3 Page Replacement Algorithms

We will use this access pattern for the following section.



#### 3.1 FIFO

How many misses will you get with FIFO?

| Page | А | В | С | D | А | В | D | С | В | А |
|------|---|---|---|---|---|---|---|---|---|---|
| 1    |   |   |   |   |   |   |   |   |   |   |
| 2    |   |   |   |   |   |   |   |   |   |   |
| 3    |   |   |   |   |   |   |   |   |   |   |

#### 3.2 LRU

How many misses will you get with LRU?

| Page | А | В | С | D | А | В | D | С | В | А |
|------|---|---|---|---|---|---|---|---|---|---|
| 1    |   |   |   |   |   |   |   |   |   |   |
| 2    |   |   |   |   |   |   |   |   |   |   |
| 3    |   |   |   |   |   |   |   |   |   |   |

#### 3.3 MIN

How many misses will you get with MIN?

| Page | А | В | С | D | А | В | D | С | В | А |
|------|---|---|---|---|---|---|---|---|---|---|
| 1    |   |   |   |   |   |   |   |   |   |   |
| 2    |   |   |   |   |   |   |   |   |   |   |
| 3    |   |   |   |   |   |   |   |   |   |   |

## 3.4 FIFO vs. LRU

LRU is an approximation of MIN, which is provably optimal. Why does FIFO still do better in this case?

### 3.5 Improving Cache Performance

If we increase the cache size, are we always guaranteed to get better cache performance?

#### 3.5.1 FIFO

#### 3.5.2 LRU

#### 3.5.3 MIN

### 4 Clock Algorithm

#### 4.1 Clock Page Table Entry

Suppose that we have a 32-bit virtual address split as follows:

| 10 Bits  | 10 Bits | 12 Bits |
|----------|---------|---------|
| Table ID | Page ID | Offset  |

Assume that the physical address is 32-bit as well. Show the format of a page table entry (PTE) complete with bits required to support the clock algorithm.

#### 4.2 Clock Algorithm Step-through

For this problem, assume that physical memory can hold at most four pages. What pages remain in memory at the end of the following sequence of page table operations and what are the use bits set to for each of these pages?

| Page | Α | В | С | Α | С | D | В | D | Α | Е | F |  |
|------|---|---|---|---|---|---|---|---|---|---|---|--|
|------|---|---|---|---|---|---|---|---|---|---|---|--|

