kernel: add and enable MGLRU for Linux 5.15
[openwrt/staging/jow.git] / target / linux / generic / pending-5.15 / 020-09-mm-multigenerational-lru-documentation.patch
1 From f59c618ed70a1e48accc4cad91a200966f2569c9 Mon Sep 17 00:00:00 2001
2 From: Yu Zhao <yuzhao@google.com>
3 Date: Tue, 2 Feb 2021 01:27:45 -0700
4 Subject: [PATCH 10/10] mm: multigenerational lru: documentation
5
6 Add Documentation/vm/multigen_lru.rst.
7
8 Signed-off-by: Yu Zhao <yuzhao@google.com>
9 Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
10 Change-Id: I1902178bcbb5adfa0a748c4d284a6456059bdd7e
11 ---
12 Documentation/vm/index.rst | 1 +
13 Documentation/vm/multigen_lru.rst | 132 ++++++++++++++++++++++++++++++
14 2 files changed, 133 insertions(+)
15 create mode 100644 Documentation/vm/multigen_lru.rst
16
17 --- a/Documentation/vm/index.rst
18 +++ b/Documentation/vm/index.rst
19 @@ -17,6 +17,7 @@ various features of the Linux memory man
20
21 swap_numa
22 zswap
23 + multigen_lru
24
25 Kernel developers MM documentation
26 ==================================
27 --- /dev/null
28 +++ b/Documentation/vm/multigen_lru.rst
29 @@ -0,0 +1,132 @@
30 +.. SPDX-License-Identifier: GPL-2.0
31 +
32 +=====================
33 +Multigenerational LRU
34 +=====================
35 +
36 +Quick Start
37 +===========
38 +Build Configurations
39 +--------------------
40 +:Required: Set ``CONFIG_LRU_GEN=y``.
41 +
42 +:Optional: Set ``CONFIG_LRU_GEN_ENABLED=y`` to turn the feature on by
43 + default.
44 +
45 +Runtime Configurations
46 +----------------------
47 +:Required: Write ``1`` to ``/sys/kernel/mm/lru_gen/enable`` if the
48 + feature was not turned on by default.
49 +
50 +:Optional: Write ``N`` to ``/sys/kernel/mm/lru_gen/min_ttl_ms`` to
51 + protect the working set of ``N`` milliseconds. The OOM killer is
52 + invoked if this working set cannot be kept in memory.
53 +
54 +:Optional: Read ``/sys/kernel/debug/lru_gen`` to confirm the feature
55 + is turned on. This file has the following output:
56 +
57 +::
58 +
59 + memcg memcg_id memcg_path
60 + node node_id
61 + min_gen birth_time anon_size file_size
62 + ...
63 + max_gen birth_time anon_size file_size
64 +
65 +``min_gen`` is the oldest generation number and ``max_gen`` is the
66 +youngest generation number. ``birth_time`` is in milliseconds.
67 +``anon_size`` and ``file_size`` are in pages.
68 +
69 +Phones/Laptops/Workstations
70 +---------------------------
71 +No additional configurations required.
72 +
73 +Servers/Data Centers
74 +--------------------
75 +:To support more generations: Change ``CONFIG_NR_LRU_GENS`` to a
76 + larger number.
77 +
78 +:To support more tiers: Change ``CONFIG_TIERS_PER_GEN`` to a larger
79 + number.
80 +
81 +:To support full stats: Set ``CONFIG_LRU_GEN_STATS=y``.
82 +
83 +:Working set estimation: Write ``+ memcg_id node_id max_gen
84 + [swappiness] [use_bloom_filter]`` to ``/sys/kernel/debug/lru_gen`` to
85 + invoke the aging, which scans PTEs for accessed pages and then
86 + creates the next generation ``max_gen+1``. A swap file and a non-zero
87 + ``swappiness``, which overrides ``vm.swappiness``, are required to
88 + scan PTEs mapping anon pages. Set ``use_bloom_filter`` to 0 to
89 + override the default behavior which only scans PTE tables found
90 + populated.
91 +
92 +:Proactive reclaim: Write ``- memcg_id node_id min_gen [swappiness]
93 + [nr_to_reclaim]`` to ``/sys/kernel/debug/lru_gen`` to invoke the
94 + eviction, which evicts generations less than or equal to ``min_gen``.
95 + ``min_gen`` should be less than ``max_gen-1`` as ``max_gen`` and
96 + ``max_gen-1`` are not fully aged and therefore cannot be evicted.
97 + Use ``nr_to_reclaim`` to limit the number of pages to evict. Multiple
98 + command lines are supported, so does concatenation with delimiters
99 + ``,`` and ``;``.
100 +
101 +Framework
102 +=========
103 +For each ``lruvec``, evictable pages are divided into multiple
104 +generations. The youngest generation number is stored in
105 +``lrugen->max_seq`` for both anon and file types as they are aged on
106 +an equal footing. The oldest generation numbers are stored in
107 +``lrugen->min_seq[]`` separately for anon and file types as clean
108 +file pages can be evicted regardless of swap and writeback
109 +constraints. These three variables are monotonically increasing.
110 +Generation numbers are truncated into
111 +``order_base_2(CONFIG_NR_LRU_GENS+1)`` bits in order to fit into
112 +``page->flags``. The sliding window technique is used to prevent
113 +truncated generation numbers from overlapping. Each truncated
114 +generation number is an index to an array of per-type and per-zone
115 +lists ``lrugen->lists``.
116 +
117 +Each generation is divided into multiple tiers. Tiers represent
118 +different ranges of numbers of accesses from file descriptors only.
119 +Pages accessed ``N`` times via file descriptors belong to tier
120 +``order_base_2(N)``. Each generation contains at most
121 +``CONFIG_TIERS_PER_GEN`` tiers, and they require additional
122 +``CONFIG_TIERS_PER_GEN-2`` bits in ``page->flags``. In contrast to
123 +moving between generations which requires list operations, moving
124 +between tiers only involves operations on ``page->flags`` and
125 +therefore has a negligible cost. A feedback loop modeled after the PID
126 +controller monitors refaulted % across all tiers and decides when to
127 +protect pages from which tiers.
128 +
129 +The framework comprises two conceptually independent components: the
130 +aging and the eviction, which can be invoked separately from user
131 +space for the purpose of working set estimation and proactive reclaim.
132 +
133 +Aging
134 +-----
135 +The aging produces young generations. Given an ``lruvec``, the aging
136 +traverses ``lruvec_memcg()->mm_list`` and calls ``walk_page_range()``
137 +to scan PTEs for accessed pages (a ``mm_struct`` list is maintained
138 +for each ``memcg``). Upon finding one, the aging updates its
139 +generation number to ``max_seq`` (modulo ``CONFIG_NR_LRU_GENS``).
140 +After each round of traversal, the aging increments ``max_seq``. The
141 +aging is due when ``min_seq[]`` reaches ``max_seq-1``.
142 +
143 +Eviction
144 +--------
145 +The eviction consumes old generations. Given an ``lruvec``, the
146 +eviction scans pages on the per-zone lists indexed by anon and file
147 +``min_seq[]`` (modulo ``CONFIG_NR_LRU_GENS``). It first tries to
148 +select a type based on the values of ``min_seq[]``. If they are
149 +equal, it selects the type that has a lower refaulted %. The eviction
150 +sorts a page according to its updated generation number if the aging
151 +has found this page accessed. It also moves a page to the next
152 +generation if this page is from an upper tier that has a higher
153 +refaulted % than the base tier. The eviction increments ``min_seq[]``
154 +of a selected type when it finds all the per-zone lists indexed by
155 +``min_seq[]`` of this selected type are empty.
156 +
157 +To-do List
158 +==========
159 +KVM Optimization
160 +----------------
161 +Support shadow page table walk.