package: memcached - upgrade to latest stable
[feed/packages.git] / net / haproxy / patches / 0007-BUG-MAJOR-connection-update-CO_FL_CONNECTED-before-c.patch
1 From afbf56b951967e8fa4d509e423fdcb11c27d40e2 Mon Sep 17 00:00:00 2001
2 From: Willy Tarreau <w@1wt.eu>
3 Date: Tue, 14 Mar 2017 20:19:29 +0100
4 Subject: [PATCH 7/7] BUG/MAJOR: connection: update CO_FL_CONNECTED before
5 calling the data layer
6
7 Matthias Fechner reported a regression in 1.7.3 brought by the backport
8 of commit 819efbf ("BUG/MEDIUM: tcp: don't poll for write when connect()
9 succeeds"), causing some connections to fail to establish once in a while.
10 While this commit itself was a fix for a bad sequencing of connection
11 events, it in fact unveiled a much deeper bug going back to the connection
12 rework era in v1.5-dev12 : 8f8c92f ("MAJOR: connection: add a new
13 CO_FL_CONNECTED flag").
14
15 It's worth noting that in a lab reproducing a similar environment as
16 Matthias' about only 1 every 19000 connections exhibit this behaviour,
17 making the issue not so easy to observe. A trick to make the problem
18 more observable consists in disabling non-blocking mode on the socket
19 before calling connect() and re-enabling it later, so that connect()
20 always succeeds. Then it becomes 100% reproducible.
21
22 The problem is that this CO_FL_CONNECTED flag is tested after deciding to
23 call the data layer (typically the stream interface but might be a health
24 check as well), and that the decision to call the data layer relies on a
25 change of one of the flags covered by the CO_FL_CONN_STATE set, which is
26 made of CO_FL_CONNECTED among others.
27
28 Before the fix above, this bug couldn't appear with TCP but it could
29 appear with Unix sockets. Indeed, connect() was always considered
30 blocking so the CO_FL_WAIT_L4_CONN connection flag was always set, and
31 polling for write events was always enabled. This used to guarantee that
32 the conn_fd_handler() could detect a change among the CO_FL_CONN_STATE
33 flags.
34
35 Now with the fix above, if a connect() immediately succeeds for non-ssl
36 connection with send-proxy enabled, and no data in the buffer (thus TCP
37 mode only), the CO_FL_WAIT_L4_CONN flag is not set, the lack of data in
38 the buffer doesn't enable polling flags for the data layer, the
39 CO_FL_CONNECTED flag is not set due to send-proxy still being pending,
40 and once send-proxy is done, its completion doesn't cause the data layer
41 to be woken up due to the fact that CO_FL_CONNECT is still not present
42 and that the CO_FL_SEND_PROXY flag is not watched in CO_FL_CONN_STATE.
43
44 Then no progress is made when data are received from the client (and
45 attempted to be forwarded), because a CF_WRITE_NULL (or CF_WRITE_PARTIAL)
46 flag is needed for the stream-interface state to turn from SI_ST_CON to
47 SI_ST_EST, allowing ->chk_snd() to be called when new data arrive. And
48 the only way to set this flag is to call the data layer of course.
49
50 After the connect timeout, the connection gets killed and if in the mean
51 time some data have accumulated in the buffer, the retry will succeed.
52
53 This patch fixes this situation by simply placing the update of
54 CO_FL_CONNECTED where it should have been, before the check for a flag
55 change needed to wake up the data layer and not after.
56
57 This fix must be backported to 1.7, 1.6 and 1.5. Versions not having
58 the patch above are still affected for unix sockets.
59
60 Special thanks to Matthias Fechner who provided a very detailed bug
61 report with a bisection designating the faulty patch, and to Olivier
62 Houchard for providing full access to a pretty similar environment where
63 the issue could first be reproduced.
64 (cherry picked from commit 7bf3fa3c23f6a1b7ed1212783507ac50f7e27544)
65 ---
66 src/connection.c | 11 +++++++----
67 1 file changed, 7 insertions(+), 4 deletions(-)
68
69 diff --git a/src/connection.c b/src/connection.c
70 index 26fc5f6..1e4c9aa 100644
71 --- a/src/connection.c
72 +++ b/src/connection.c
73 @@ -131,6 +131,13 @@ void conn_fd_handler(int fd)
74 }
75
76 leave:
77 + /* Verify if the connection just established. The CO_FL_CONNECTED flag
78 + * being included in CO_FL_CONN_STATE, its change will be noticed by
79 + * the next block and be used to wake up the data layer.
80 + */
81 + if (unlikely(!(conn->flags & (CO_FL_WAIT_L4_CONN | CO_FL_WAIT_L6_CONN | CO_FL_CONNECTED))))
82 + conn->flags |= CO_FL_CONNECTED;
83 +
84 /* The wake callback may be used to process a critical error and abort the
85 * connection. If so, we don't want to go further as the connection will
86 * have been released and the FD destroyed.
87 @@ -140,10 +147,6 @@ void conn_fd_handler(int fd)
88 conn->data->wake(conn) < 0)
89 return;
90
91 - /* Last check, verify if the connection just established */
92 - if (unlikely(!(conn->flags & (CO_FL_WAIT_L4_CONN | CO_FL_WAIT_L6_CONN | CO_FL_CONNECTED))))
93 - conn->flags |= CO_FL_CONNECTED;
94 -
95 /* remove the events before leaving */
96 fdtab[fd].ev &= FD_POLL_STICKY;
97
98 --
99 2.10.2
100