mac80211: backport tx queueing bugfixes add a bug fix for a rare crash
authorFelix Fietkau <nbd@nbd.name>
Sat, 26 Mar 2022 23:05:49 +0000 (00:05 +0100)
committerFelix Fietkau <nbd@nbd.name>
Thu, 15 Sep 2022 15:52:28 +0000 (17:52 +0200)
Re-introduce the queue wake fix that was reverted due to a regression,
but this time with the follow-up fixes that take care of the regression.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
(cherry picked from commit 9a93b62f315ad4c9f021c414ed80ba337ab4a01e)
(cherry-picked from commit 8b804cae5e039142bc63896a75f15146eca3bebc)
(cherry-picked from commit 8b06e06832ebe757246582b65306ad2a2537741f)

package/kernel/mac80211/patches/subsys/328-mac80211-do-not-wake-queues-on-a-vif-that-is-being-s.patch [new file with mode: 0644]
package/kernel/mac80211/patches/subsys/340-wifi-mac80211-do-not-abuse-fq.lock-in-ieee80211_do_s.patch [new file with mode: 0644]
package/kernel/mac80211/patches/subsys/341-mac80211-Fix-deadlock-Don-t-start-TX-while-holding-f.patch [new file with mode: 0644]
package/kernel/mac80211/patches/subsys/342-mac80211-Ensure-vif-queues-are-operational-after-sta.patch [new file with mode: 0644]

diff --git a/package/kernel/mac80211/patches/subsys/328-mac80211-do-not-wake-queues-on-a-vif-that-is-being-s.patch b/package/kernel/mac80211/patches/subsys/328-mac80211-do-not-wake-queues-on-a-vif-that-is-being-s.patch
new file mode 100644 (file)
index 0000000..f0150dd
--- /dev/null
@@ -0,0 +1,38 @@
+From: Felix Fietkau <nbd@nbd.name>
+Date: Sat, 26 Mar 2022 23:58:35 +0100
+Subject: [PATCH] mac80211: do not wake queues on a vif that is being stopped
+
+When a vif is being removed and sdata->bss is cleared, __ieee80211_wake_txqs
+can still be called on it, which crashes as soon as sdata->bss is being
+dereferenced.
+To fix this properly, check for SDATA_STATE_RUNNING before waking queues,
+and take the fq lock when setting it (to ensure that __ieee80211_wake_txqs
+observes the change when running on a different CPU
+
+Signed-off-by: Felix Fietkau <nbd@nbd.name>
+---
+
+--- a/net/mac80211/iface.c
++++ b/net/mac80211/iface.c
+@@ -377,7 +377,9 @@ static void ieee80211_do_stop(struct iee
+       bool cancel_scan;
+       struct cfg80211_nan_func *func;
++      spin_lock_bh(&local->fq.lock);
+       clear_bit(SDATA_STATE_RUNNING, &sdata->state);
++      spin_unlock_bh(&local->fq.lock);
+       cancel_scan = rcu_access_pointer(local->scan_sdata) == sdata;
+       if (cancel_scan)
+--- a/net/mac80211/util.c
++++ b/net/mac80211/util.c
+@@ -301,6 +301,9 @@ static void __ieee80211_wake_txqs(struct
+       local_bh_disable();
+       spin_lock(&fq->lock);
++      if (!test_bit(SDATA_STATE_RUNNING, &sdata->state))
++              goto out;
++
+       if (sdata->vif.type == NL80211_IFTYPE_AP)
+               ps = &sdata->bss->ps;
diff --git a/package/kernel/mac80211/patches/subsys/340-wifi-mac80211-do-not-abuse-fq.lock-in-ieee80211_do_s.patch b/package/kernel/mac80211/patches/subsys/340-wifi-mac80211-do-not-abuse-fq.lock-in-ieee80211_do_s.patch
new file mode 100644 (file)
index 0000000..82243f1
--- /dev/null
@@ -0,0 +1,46 @@
+From aa40d5a43526cca9439a2b45fcfdcd016594dece Mon Sep 17 00:00:00 2001
+From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
+Date: Sun, 17 Jul 2022 21:21:52 +0900
+Subject: [PATCH] wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
+for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif
+that is being stopped") guards clear_bit() using fq.lock even before
+fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.
+
+According to discussion [2], Toke was not happy with expanding usage of
+fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
+can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().
+
+Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1]
+Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2]
+Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
+Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
+Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
+Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
+Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>
+Signed-off-by: Kalle Valo <kvalo@kernel.org>
+Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp
+[ pick up commit 3598cb6e1862 ("wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()") from -next]
+Link: https://lore.kernel.org/all/87o7xcq6qt.fsf@kernel.org/
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+---
+ net/mac80211/iface.c | 3 +--
+ 1 file changed, 1 insertion(+), 2 deletions(-)
+
+--- a/net/mac80211/iface.c
++++ b/net/mac80211/iface.c
+@@ -377,9 +377,8 @@ static void ieee80211_do_stop(struct iee
+       bool cancel_scan;
+       struct cfg80211_nan_func *func;
+-      spin_lock_bh(&local->fq.lock);
+       clear_bit(SDATA_STATE_RUNNING, &sdata->state);
+-      spin_unlock_bh(&local->fq.lock);
++      synchronize_rcu(); /* flush _ieee80211_wake_txqs() */
+       cancel_scan = rcu_access_pointer(local->scan_sdata) == sdata;
+       if (cancel_scan)
diff --git a/package/kernel/mac80211/patches/subsys/341-mac80211-Fix-deadlock-Don-t-start-TX-while-holding-f.patch b/package/kernel/mac80211/patches/subsys/341-mac80211-Fix-deadlock-Don-t-start-TX-while-holding-f.patch
new file mode 100644 (file)
index 0000000..8c56acb
--- /dev/null
@@ -0,0 +1,40 @@
+From: Alexander Wetzel <alexander@wetzel-home.de>
+Date: Thu, 15 Sep 2022 14:41:20 +0200
+Subject: [PATCH] mac80211: Fix deadlock: Don't start TX while holding
+ fq->lock
+
+ieee80211_txq_purge() calls fq_tin_reset() and
+ieee80211_purge_tx_queue(); Both are then calling
+ieee80211_free_txskb(). Which can decide to TX the skb again.
+
+There are at least two ways to get a deadlock:
+
+1) When we have a TDLS teardown packet queued in either tin or frags
+   ieee80211_tdls_td_tx_handle() will call ieee80211_subif_start_xmit()
+   while we still hold fq->lock. ieee80211_txq_enqueue() will thus
+   deadlock.
+
+2) A variant of the above happens if aggregation is up and running:
+   In that case ieee80211_iface_work() will deadlock with the original
+   task: The original tasks already holds fq->lock and tries to get
+   sta->lock after kicking off ieee80211_iface_work(). But the worker
+   can get sta->lock prior to the original task and will then spin for
+   fq->lock.
+
+Avoid these deadlocks by not sending out any skbs when called via
+ieee80211_free_txskb().
+
+Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
+---
+
+--- a/net/mac80211/status.c
++++ b/net/mac80211/status.c
+@@ -698,7 +698,7 @@ static void ieee80211_report_used_skb(st
+               if (!sdata) {
+                       skb->dev = NULL;
+-              } else {
++              } else if (!dropped) {
+                       unsigned int hdr_size =
+                               ieee80211_hdrlen(hdr->frame_control);
diff --git a/package/kernel/mac80211/patches/subsys/342-mac80211-Ensure-vif-queues-are-operational-after-sta.patch b/package/kernel/mac80211/patches/subsys/342-mac80211-Ensure-vif-queues-are-operational-after-sta.patch
new file mode 100644 (file)
index 0000000..4310329
--- /dev/null
@@ -0,0 +1,47 @@
+From: Alexander Wetzel <alexander@wetzel-home.de>
+Date: Thu, 15 Sep 2022 15:09:46 +0200
+Subject: [PATCH] mac80211: Ensure vif queues are operational after start
+
+Make sure local->queue_stop_reasons and vif.txqs_stopped stay in sync.
+
+When a new vif is created the queues may end up in an inconsistent state
+and be inoperable:
+Communication not using iTXQ will work, allowing to e.g. complete the
+association. But the 4-way handshake will time out. The sta will not
+send out any skbs queued in iTXQs.
+
+All normal attempts to start the queues will fail when reaching this
+state.
+local->queue_stop_reasons will have marked all queues as operational but
+vif.txqs_stopped will still be set, creating an inconsistent internal
+state.
+
+In reality this seems to be race between the mac80211 function
+ieee80211_do_open() setting SDATA_STATE_RUNNING and the wake_txqs_tasklet:
+Depending on the driver and the timing the queues may end up to be
+operational or not.
+
+Cc: stable@vger.kernel.org
+Fixes: f856373e2f31 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
+Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
+---
+
+--- a/net/mac80211/util.c
++++ b/net/mac80211/util.c
+@@ -301,14 +301,14 @@ static void __ieee80211_wake_txqs(struct
+       local_bh_disable();
+       spin_lock(&fq->lock);
++      sdata->vif.txqs_stopped[ac] = false;
++
+       if (!test_bit(SDATA_STATE_RUNNING, &sdata->state))
+               goto out;
+       if (sdata->vif.type == NL80211_IFTYPE_AP)
+               ps = &sdata->bss->ps;
+-      sdata->vif.txqs_stopped[ac] = false;
+-
+       list_for_each_entry_rcu(sta, &local->sta_list, list) {
+               if (sdata != sta->sdata)
+                       continue;