Basics of Integration using Monte Carlo

Recently, someone asked me about Monte Carlo. So, I thought I should write this post to provide a basic introduction of performing integration using Monte Carlo. "Why integration?", you asked. Well, this is because integration is one of the main operations done in computing the posterior probability distributions used in machine learning and probabilistic filtering (e.g. Bayes filter). For example, consider the typical posterior probability expression in the Bayes filtering context, $$p(x_{t} \mid y_{1:t})$$, i.e. the probability of hidden state $$x_{t}$$ given the observed sequence of measurements up to the current time point $$y_{1:t}$$, $$p(x_{t} \mid y_{1:t}) = \frac{p(y_{t} \mid x_{t}) p(x_{t} \mid y_{1:t-1})}{p(y_{t} \mid y_{1:t-1})}$$ The $$p(x_{t} \mid y_{1:t-1})$$ is of particular interest. It can be seen as the prediction of $$x_{t}$$ from previous observations $$y_{1:t-1}$$ and can be expressed as p(x_{t} \mid y_{1:t-1}) = \int p(x_{t} \mid x_{t-1}) p(x_{t-1}

Having fun with port mirroring, ntopng and traffic shaping

Out of interest, I recently managed to set up a monitoring system for my home network. I have been meaning to do this for quite some time already, but never gotten the chance, due to many other commitments. So, here is just a personal note on how I have done so, while having some fun with port mirroring (using iptables/netfilter), ntopng and traffic shaping (using tc).

First of all, my router (TP-Link TL-WR841N/ND v9) has been set up with OpenWrt, where scripts were written to monitor the relative bandwidth used by each housemate; rules for iptables were written to perform accounting of the cumulative number of bytes sent and received by each host in the network. Basing off wrtbwmon by Peter Bailey, the scripts were further modified to include rules for ip6tables, so that IPv6 traffic could be taken into account. I have also augmented the display of the network statistics using Google Charts API. The result is shown in Figure 1.

This has been the set-up for over a year and a half, and there is really nothing fancy about it; I have always wanted to do much more.

However, given the limited flash capacity of the router (see Figure 2), there is only so much that could be done when it comes to collecting more network-related statistics. While some routers have USB interfaces to which external storage could be connected, the router I have at hand does not have any such interfaces exposed (the QCA953x SoC in the router does not support USB [1]). As such, the only option is to relay any network traffic to a separate device. I figured, this should be easy with a spare, unused Raspberry Pi.

Basically, the main idea is to have the Raspberry Pi hooked up to one of the router's LAN switch port via ethernet. Then, configure the router to duplicate/mirror any traffic to it towards the Raspberry Pi (via port mirroring). Alternatively, we could have a hub (NOTE: not switch) between the router's WAN port and the Raspberry Pi, allowing the latter to sniff all ethernet frames to and from the modem indiscriminately. As I do not have a hub, I opted for the first option.

Port Mirroring (IPv4)

Using the iptables' TEE extension, port mirroring is an easy feat. We just need rules for mirroring any incoming traffic destined towards the LAN network, and any outgoing traffic originating from the same network towards the WAN interface. On the router, the rules to achieve this are:

iptables -t mangle -A POSTROUTING -o br-lan ! -s 192.168.1.0/24 -j TEE --gateway 192.168.1.238
iptables -t mangle -A POSTROUTING -o pppoe-wan -s 192.168.1.0/24 -j TEE --gateway 192.168.1.238

Both rules are included in the POSTROUTING chain of the mangle table, though it need not be that way since equivalent rules which do the same thing could also be written in the FORWARD chain. The main difference is that, including the rules in the FORWARD chain allows packets to be mirrored even if they were to be dropped by rules in the FORWARD chain of the filter table. In any case, the rules shown above simply state that, packets, with source address matching the network 192.168.1.0/24 and which have been routed to the WAN interface (i.e. pppoe-wan), are to be duplicated and sent to 192.168.1.238 (i.e. the Raspberry Pi), while packets originating from the outside of 192.168.1.0/24 but routed to the bridged LAN interface (i.e. br-lan) are mirrored to the Raspberry Pi.

There is one glaring problem with how things are currently configured, however. That is, the Raspberry Pi would receive duplicates of any packets destined to it, and have every packet from it duplicated by the router upstream (see Figure 3). Although this may not seem to be a huge issue as duplicates are simply managed by the upper protocol stack, it needlessly increases the volume of traffic required to be processed by the router and the Raspberry Pi.

To that end, it would be nice to exclude packets to and from the Raspberry Pi from being mirrored. Doing so requires slight additions to the previous rules, like so:

# ------ Port mirroring section (IPv4) ------
iptables -t mangle -N wan_to_lan_tee
iptables -t mangle -N lan_to_wan_tee
iptables -t mangle -A wan_to_lan_tee -o br-lan ! -s 192.168.1.0/24 -j TEE --gateway 192.168.1.238
iptables -t mangle -A lan_to_wan_tee -o pppoe-wan -s 192.168.1.0/24 -j TEE --gateway 192.168.1.238

# Workaround to prevent traffic to and from the port mirroring gateway itself from being duplicated.
iptables -t mangle -I POSTROUTING -o br-lan ! -d 192.168.1.238 -j wan_to_lan_tee
iptables -t mangle -I POSTROUTING -o pppoe-wan ! -s 192.168.1.238 -j lan_to_wan_tee

The existing rules are now nested within two user-defined chains, each of which is only evaluated if the Raspberry Pi is neither the source nor the destination of the packet.

With that, all the required iptables' rules for IPv4 have been included, enabling the Raspberry Pi to capture any mirrored IPv4 traffic by listening on its ethernet interface in promiscuous mode (e.g. using tcpdump, Wireshark, ntopng etc.).

Port Mirroring (IPv6)

In the case of IPv6, there is really no conceptual difference with how the rules should be written as compared to before. The only thing to note of would be the workaround for preventing packets, destined to and originating from the Raspberry Pi, from being duplicated. This is because IPv6 applications will typically use a generated, temporary IPv6 host id for communication (ref: privacy extensions from RFC4941). Consequently, it is not clear how to write the IPv6 equivalent of the following:

iptables -t mangle -I POSTROUTING -o br-lan ! -d 192.168.1.238 -j wan_to_lan_tee
iptables -t mangle -I POSTROUTING -o pppoe-wan ! -s 192.168.1.238 -j lan_to_wan_tee

Specifically, what IPv6 address do we now use for each matching criterion, given that it is no longer guaranteed to be fixed? (Note, however, that the temporary host id is not an issue for the --gateway option of TEE, since the fixed SLAAC address could be used.)

Granted, I could turn off the RFC4941 privacy extensions, but I prefer not to do so. Therefore, a natural workaround is to use the mac module of iptables/ip6tables in such a way that connections initiated from the source MAC address of the Raspberry Pi are tracked and marked in the PREROUTING chain, allowing for rules in the POSTROUTING chain to be written with respect to the mark condition:

# ------ Port mirroring section (IPv6) ------
ip6tables -t mangle -N wan6_to_lan6_tee
ip6tables -t mangle -N lan6_to_wan6_tee
ip6tables -t mangle -A wan6_to_lan6_tee -o br-lan ! -s <IPv6 global /64 prefix>/64 -j TEE --gateway <SLAAC address>
ip6tables -t mangle -A lan6_to_wan6_tee -o pppoe-wan -s <IPv6 global /64 prefix>/64 -j TEE --gateway <SLAAC address>

# Workaround to prevent traffic to and from the port mirroring gateway itself from being duplicated.
ip6tables -t mangle -A PREROUTING -i br-lan -m state --state NEW -m mac --mac-source <Raspberry Pi MAC> -j CONNMARK --set-mark 5
ip6tables -t mangle -I POSTROUTING -o br-lan -m connmark ! --mark 5 -j wan6_to_lan6_tee
ip6tables -t mangle -I POSTROUTING -o pppoe-wan -m connmark ! --mark 5 -j lan6_to_wan6_tee

Of course, this assumes there are no unsolicited connections from any hosts located in remote networks. Otherwise, the rules will evaluate to true and the nested rules in the user-defined chains are traversed. Fortunately, such occurrences are non-existent and the assumption is valid since the FORWARD chain of the filter table has already been configured to drop unsolicited connections anyway (like any stateful firewall would).

ntopng

Having set up port mirroring, ntopng could be installed on the Raspberry Pi to analyze all mirrored traffic. Here are a few nice insights from ntopng about my network:

In the future, I am looking to collect the raw packets separately to build custom machine learning models out of them. Applications of interest are network fault classification and modelling of malicious connections from malware, among others.

Traffic shaping

Traffic shaping is required for my home network, given that some hosts are notorious for saturating the upload bandwidth of my internet connection, thereby causing other higher priority packets to be dropped and increasing latency. To that end, I have used tc to restrict the maximum allowable upload bandwidth for these hosts such that a limit is imposed whenever other hosts require these bandwidth. Otherwise, no restrictions are applied.

The following achieves this:

#!/bin/sh
IN_DEV="br-lan"
OUT_DEV="pppoe-wan"
PREROUTING_MARK="8"
PACKET_MARK="6"

# Remember to install kmod-sched by running "opkg install kmod-sched"

tc qdisc delete dev $OUTDEV root # Default class of 1:10 tc qdisc add dev$OUT_DEV root handle 1:0 htb default 10

tc class add dev $OUT_DEV parent root classid 1:1 htb rate 2mbit ceil 2mbit burst 10mb cburst 10mb tc class add dev$OUT_DEV parent 1:1 classid 1:10 htb rate 2mbit ceil 2mbit burst 10mb cburst 10mb
tc class add dev $OUT_DEV parent 1:1 classid 1:20 htb rate 350kbit ceil 2mbit tc qdisc add dev$OUT_DEV parent 1:10 handle 10: sfq perturb 10
tc qdisc add dev $OUT_DEV parent 1:20 handle 20: sfq perturb 10 # Assign packet marked with '6' to class 1:20 tc filter add dev$OUT_DEV parent 1: protocol ip prio 1 handle $PACKET_MARK fw flowid 1:20 # Mark relevant packet before egress of qdisc while read MAC_SOURCE; do PREROUTING_RULE="PREROUTING -i$IN_DEV -m mac --mac-source $MAC_SOURCE -j MARK --set-mark$PREROUTING_MARK"
iptables -t mangle -C $PREROUTING_RULE 2> /dev/null if [$? -ne 0 ]; then
iptables -t mangle -A $PREROUTING_RULE fi ip6tables -t mangle -C$PREROUTING_RULE 2> /dev/null
if [ $? -ne 0 ]; then ip6tables -t mangle -A$PREROUTING_RULE
fi
done < hosts_list.txt

POSTROUTING_RULE="POSTROUTING -o $OUT_DEV -m mark --mark$PREROUTING_MARK -j MARK --set-mark $PACKET_MARK" iptables -t mangle -C$POSTROUTING_RULE 2> /dev/null
if [ $? -ne 0 ]; then iptables -t mangle -A$POSTROUTING_RULE
fi
ip6tables -t mangle -C $POSTROUTING_RULE 2> /dev/null if [$? -ne 0 ]; then
ip6tables -t mangle -A \$POSTROUTING_RULE
fi

Essentially, what it does at the start is create a hierarchical token bucket queuing discipline at the root and specifying the default class to which packets not matched by any of the subsequent shaping criteria is applied. It then proceeds to define a child class (i.e. 1:1) that encompasses all the other subdivisions of the allocated 2Mbit/s upload bandwidth. Out of the 2Mbit/s, packets of class 1:10 are permitted to use all of the allocations, while packets of class 1:20 are limited to 350kbit/s and only allowed to borrow any additional bandwidth up to 2Mbit/s as long as it does not jeopardize the transmission of packets of class 1:10. Both 1:10 and 1:20 are next each terminated by the classless sfq queuing discipline respectively. In the end, a filter is specified to link packets marked by iptables to class 1:20.

The rest of the shell script simply creates the iptables rules for marking packets originating from MAC addresses contained in the file hosts_list.txt.