{"id":471,"date":"2024-05-23T16:09:20","date_gmt":"2024-05-23T04:09:20","guid":{"rendered":"https:\/\/micro.muppetz.com\/blog\/?p=471"},"modified":"2024-05-23T22:07:56","modified_gmt":"2024-05-23T10:07:56","slug":"vyos-and-the-mystery-conntrack-counter","status":"publish","type":"post","link":"https:\/\/micro.muppetz.com\/blog\/2024\/05\/23\/vyos-and-the-mystery-conntrack-counter\/","title":{"rendered":"Vyos and the mystery conntrack counter"},"content":{"rendered":"<p><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/encrypted-tbn0.gstatic.com\/images?q=tbn:ANd9GcQ1JOqj04-jrcTIOHdPS1YMaNZErAzzDoIOJhDysE1AYQ&amp;s\" alt=\"VyOS\" \/><\/p>\n<h2>Router Go Fast!<\/h2>\n<p>For those that know me, it&#8217;s no secret I&#8217;m a huge <a href=\"https:\/\/vyos.io\/\">Vyos<\/a> fan.\u00a0 I moved away from pfSense after they released a version that had issues with <a href=\"https:\/\/forum.netgate.com\/post\/908806\">more than 1 vCPU<\/a>, instead trying Vyos.\u00a0 Once I&#8217;d seen how much better it performed under Proxmox I stayed on it and have never looked back.<br \/>\nI really feel the pfSense project lost its way when they fucked up Wireguard and then lashed out at everyone who was just trying to <a href=\"https:\/\/www.reddit.com\/r\/PFSENSE\/comments\/m6zcml\/netgate_appears_to_have_removed_scotts_latest\/grawx3r\/\">help get it into FreeBSD<\/a>.\u00a0 But I digress.<\/p>\n<h2>Flow Offload (Flowtable) Bug?<\/h2>\n<p>I recently upgraded from Vyos 1.3 to <a href=\"https:\/\/blog.vyos.io\/vyos-1.4.0-lts-release\">Vyos 1.4<\/a>. 1.4 is a huge step forward for the project as it moves from iptables to nftables.\u00a0 It also brings some great new features, like <a href=\"https:\/\/docs.kernel.org\/networking\/nf_flowtable.html\">Flowtable<\/a> (software\/hardware flow offload).\u00a0 This means that once a flow has created a conntrack entry, all future packets that match this flow are fastpath&#8217;d through the conntrack service, assuming you have a rule to allow this like so:<\/p>\n<pre>firewall {\r\n    flowtable FastVyos {\r\n        description \"Vyos Fast Software Offload Table\"\r\n        interface eth1\r\n        interface eth0\r\n        offload software\r\n    }\r\n    ipv4 {\r\n        forward {\r\n            filter {\r\n                default-action accept\r\n                description \"Filter for packets being forwarded through the router\"\r\n                rule 10 {\r\n                    action offload\r\n                    description \"Offload Established TCP and UDP Traffic\"\r\n                    offload-target FastVyos\r\n                    protocol tcp_udp\r\n<strong>                    state established\r\n                    state related<\/strong>\r\n                }<\/pre>\n<p>This results in better performance and on older\/slower hardware will increase the number of packets-per-second that a device can handle.\u00a0 This is a very good thing!\u00a0 You can see its working if you see <strong>[OFFLOAD]<\/strong> in your conntrack table:<\/p>\n<pre>conntrack -L -u offload\r\n&lt;snip&gt;\r\ntcp      6 src=x.x.x.x dst=x.x.x.x sport=43392 dport=x packets=2317 bytes=192092 src=192.168.0.5 dst=x.x.x.x sport=x dport=43392 packets=2644 bytes=149748 [<strong>OFFLOAD<\/strong>] mark=0 use=2\r\ntcp      6 src=x.x.x.x dst=x.x.x.x sport=55400 dport=x packets=1336 bytes=111233 src=192.168.0.5 dst=x.x.x.x sport=x dport=55400 packets=1535 bytes=88580 [<strong>OFFLOAD<\/strong>] mark=0 use=2\r\ntcp      6 src=x.x.x.x dst=x.x.x.x sport=49836 dport=x packets=850 bytes=70559 src=192.168.0.5 dst=x.x.x.x sport=x dport=49836 packets=1060 bytes=58247 [<strong>OFFLOAD<\/strong>] mark=0 use=2\r\ntcp      6 src=x.x.x.x dst=x.x.x.x sport=60797 dport=x packets=10014 bytes=827040 src=192.168.0.5 dst=x.x.x.x sport=x dport=60797 packets=10125 bytes=570347 [<strong>OFFLOAD<\/strong>] mark=0 use=2\r\nconntrack v1.4.7 (conntrack-tools): 693 flow entries have been shown.\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p>Once I&#8217;d turned on Flowtable though, I started to have issues with Firebase Cloud Messaging on my Android phones.\u00a0 It&#8217;d keep timing out and I wouldn&#8217;t get push notifications until I woke up my phone.\u00a0 I spent ages debugging, Wiresharking, testing with Flowtable on and off.\u00a0 It always would work with Flowtable off, but would fail\/disconnect with Flowtable enabled. In the end, convinced I had found a bunch in nftables (quite the accusation to make!) I <a href=\"https:\/\/bugzilla.netfilter.org\/show_bug.cgi?id=1743\">logged a bug<\/a> in the Netfilter BugTracker. Turns out I was actually correct, there <em>was<\/em> an issue with PPPoE encapsulation and Flowtable.\u00a0 I&#8217;d actually switched ISPs to one that does DHCP (not because of the bug!) and I hadn&#8217;t noticed the problem with DHCP, it was good validation to see it was a PPPoE + Flowtable bug.<br \/>\nI should point out for any PPPoE users out there, the <span style=\"text-decoration: underline;\">bug is fixed in Linux 6.6.30<\/span> onwards, which the latest Vyos 1.5 rolling images are using.<\/p>\n<h2>Conntrack Clashes?<\/h2>\n<p>So now my router was working perfectly, but for some reason at some stage I decided to look at the conntrack table statistics.\u00a0 I just like to see how things work &#8220;under the hood&#8221; I guess.<\/p>\n<p>Wait, what&#8217;s this? What the hell is <strong>clash_resolve<\/strong> in my conntrack statistics and why is it going up by ~300 a minute? That can&#8217;t be a good thing, can it?<\/p>\n<pre>tim@ferrari:~$ conntrack -S\r\ncpu=0 found=13872 invalid=64978 insert=0 insert_failed=2130 drop=2130 early_drop=0 error=1966 search_restart=0 clash_resolve=1091611 chaintoolong=0 \r\ncpu=1 found=13353 invalid=64876 insert=0 insert_failed=2164 drop=2164 early_drop=0 error=1760 search_restart=0 clash_resolve=1098408 chaintoolong=0<\/pre>\n<p>I spent a lot of time googling, but trying to find any real information about what it does was hard.\u00a0 There were the <a href=\"https:\/\/patchwork.ozlabs.org\/project\/netfilter-devel\/cover\/20200825225245.8072-1-fw@strlen.de\/\">main<\/a> <a href=\"https:\/\/git.kernel.org\/pub\/scm\/linux\/kernel\/git\/torvalds\/linux.git\/commit\/?id=ed07d9a021df6da53456663a76999189badc432a\">links<\/a> I found that offered some insight.<\/p>\n<p>It turns out what a clash_resolve is, <em>at least to my understanding<\/em>, is that when conntrack tries to create an entry, if there&#8217;s already an entry for that tuple [source IP, source port, dest port]:[destination ip, source port, dest port] that it will instead shift the source port of the incoming packet so that it&#8217;s unique, and create a conntrack entry based on that.\u00a0 I haven&#8217;t explained that very well because I never could quite find exactly what was going on myself to a level I felt I understood.\u00a0 Probably I&#8217;m too stupid really, so if you have a better plain english explanation I&#8217;d welcome it!<\/p>\n<p>But I did find the cause of the problem.\u00a0 My Vyos router runs a caching nameserver, it&#8217;s my home router so it makes sense for it to cache most DNS lookups.\u00a0 I have a <a href=\"https:\/\/www.zabbix.com\/\">Zabbix<\/a> Server at home too and it generates A LOT of DNS requests.\u00a0 I found as soon as I turned off my Zabbix server that the clash_resolve stopped incrementing.\u00a0 After looking at the conntrack table I realised there were <strong>hundreds<\/strong> of conntrack entries between the DNS Server on the router and my Zabbix server:<\/p>\n<pre>&lt;snip snip&gt;\r\nudp      17 19 src=192.168.0.253 dst=192.168.0.1 sport=48288 dport=53 packets=2 bytes=124 src=192.168.0.1 dst=192.168.0.253 sport=53 dport=48288 packets=2 bytes=189 mark=0 use=1\r\nudp      17 12 src=192.168.0.253 dst=192.168.0.1 sport=56102 dport=53 packets=1 bytes=68 src=192.168.0.1 dst=192.168.0.253 sport=53 dport=56102 packets=1 bytes=84 mark=0 use=1\r\nudp      17 12 src=192.168.0.253 dst=192.168.0.1 sport=56240 dport=53 packets=2 bytes=136 src=192.168.0.1 dst=192.168.0.253 sport=53 dport=56240 packets=2 bytes=201 mark=0 use=1\r\nudp      17 28 src=192.168.0.253 dst=192.168.0.1 sport=38695 dport=53 packets=2 bytes=124 src=192.168.0.1 dst=192.168.0.253 sport=53 dport=38695 packets=2 bytes=189 mark=0 use=1\r\nudp      17 10 src=192.168.0.253 dst=192.168.0.1 sport=33689 dport=53 packets=2 bytes=128 src=192.168.0.1 dst=192.168.0.253 sport=53 dport=33689 packets=2 bytes=193 mark=0 use=1\r\nudp      17 13 src=192.168.0.253 dst=192.168.0.1 sport=36236 dport=53 packets=2 bytes=128 src=192.168.0.1 dst=192.168.0.253 sport=53 dport=36236 packets=2 bytes=193 mark=0 use=1\r\nudp      17 10 src=192.168.0.253 dst=192.168.0.1 sport=49932 dport=53 packets=2 bytes=116 src=192.168.0.1 dst=192.168.0.253 sport=53 dport=49932 packets=2 bytes=181 mark=0 use=1\r\nudp      17 3 src=192.168.0.253 dst=192.168.0.1 sport=54581 dport=53 packets=2 bytes=126 src=192.168.0.1 dst=192.168.0.253 sport=53 dport=54581 packets=2 bytes=191 mark=0 use=1\r\nudp      17 1 src=192.168.0.253 dst=192.168.0.1 sport=40209 dport=53 packets=2 bytes=126 src=192.168.0.1 dst=192.168.0.253 sport=53 dport=40209 packets=2 bytes=191 mark=0 use=2\r\nudp      17 12 src=192.168.0.253 dst=192.168.0.1 sport=48388 dport=53 packets=2 bytes=136 src=192.168.0.1 dst=192.168.0.253 sport=53 dport=48388 packets=2 bytes=201 mark=0 use=1\r\nudp      17 4 src=192.168.0.253 dst=192.168.0.1 sport=51367 dport=53 packets=2 bytes=128 src=192.168.0.1 dst=192.168.0.253 sport=53 dport=51367 packets=2 bytes=193 mark=0 use=1\r\nconntrack v1.4.7 (conntrack-tools): 167 flow entries have been shown.\r\n<\/pre>\n<h2>Fixing Conntrack<\/h2>\n<p>And here&#8217;s the fix:\u00a0<strong>Those connections don&#8217;t need to be in conntrack!<\/strong> There&#8217;s no NAT going on and I&#8217;m not doing any firewalling.\u00a0 It&#8217;s local LAN traffic.\u00a0 So the fix is to put in an exception rule, so that all traffic from my LAN, talking to my DNS Server on the router, <strong>bypasses conntrack<\/strong>.\u00a0 Meaning there&#8217;s no state at all, it&#8217;s just a normal routed packet.<br \/>\nNote that you have to <em>create a rule in both directions<\/em>, otherwise the router sending back replies generates a conntrack entry.<\/p>\n<p>The configuration looks like this, placed under the &#8220;system conntrack&#8221; stanza:<\/p>\n<pre>show system conntrack\r\n ignore {\r\n     ipv4 {\r\n         rule 10 {\r\n             description \"Ignore Conntrack for LAN DNS Requests to Router\"\r\n             destination {\r\n                 address 192.168.0.1\r\n                 port 53\r\n             }\r\n             inbound-interface eth1\r\n             protocol udp\r\n             source {\r\n                 address 192.168.0.0\/24\r\n             }\r\n         }\r\n         rule 200 {\r\n             description \"Ignore Conntrack for LAN DNS Replies from Router\"\r\n             destination {\r\n                 address 192.168.0.0\/24\r\n             }\r\n             protocol udp\r\n             source {\r\n                 address 192.168.0.1\r\n                 port 53\r\n             }\r\n         }\r\n     }\r\n<\/pre>\n<p>With this in place, the conntrack table statistics for clash_resolve have stopped going up rapidly.\u00a0 I still see some increasing, but that&#8217;s expected.\u00a0 In fact clash_resolve isn&#8217;t even a problem as such, it&#8217;s just saying a clash was noticed and resolved.<\/p>\n<p>I&#8217;ve also saved myself 650+ entries in the conntrack table that I didn&#8217;t need:<\/p>\n<pre>[Before the change above was made]\r\ntim@ferrari# conntrack -L -s 192.168.0.0\/24 -d 192.168.0.0\/24\r\n&lt;snip&gt;\r\nconntrack v1.4.7 (conntrack-tools): 665 flow entries have been shown.\r\n<\/pre>\n<p><strong>And my Vyos router is as performant as ever!<\/strong><\/p>\n<p>Tim<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Router Go Fast! For those that know me, it&#8217;s no secret I&#8217;m a huge Vyos fan.\u00a0 I moved away from pfSense after they released a version that had issues with more than 1 vCPU, instead trying Vyos.\u00a0 Once I&#8217;d seen how much better it performed under Proxmox I stayed on it and have never looked [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12,11,10,2],"tags":[147,150,21,16,148,149,146],"class_list":["post-471","post","type-post","status-publish","format-standard","hentry","category-networking","category-routing","category-technical","category-thoughts","tag-conntrack","tag-firewall","tag-linux","tag-networking","tag-nftables","tag-router","tag-vyos"],"_links":{"self":[{"href":"https:\/\/micro.muppetz.com\/blog\/wp-json\/wp\/v2\/posts\/471","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/micro.muppetz.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/micro.muppetz.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/micro.muppetz.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/micro.muppetz.com\/blog\/wp-json\/wp\/v2\/comments?post=471"}],"version-history":[{"count":13,"href":"https:\/\/micro.muppetz.com\/blog\/wp-json\/wp\/v2\/posts\/471\/revisions"}],"predecessor-version":[{"id":487,"href":"https:\/\/micro.muppetz.com\/blog\/wp-json\/wp\/v2\/posts\/471\/revisions\/487"}],"wp:attachment":[{"href":"https:\/\/micro.muppetz.com\/blog\/wp-json\/wp\/v2\/media?parent=471"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/micro.muppetz.com\/blog\/wp-json\/wp\/v2\/categories?post=471"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/micro.muppetz.com\/blog\/wp-json\/wp\/v2\/tags?post=471"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}