(EC2) vxlanを使ったDockerコンテナ仮想ネットワークオーバーレイ

SoftLayer Summitにて
「DockerとOpenVNetを用いたSoftLayer VLAN上への仮想ネットワークオーバーレイ」
http://www.slideshare.net/cloudconductor/softlayer-summit-2015
を見て、Dockerの仮想ネットワークオーバーレイに興味を持ちました。
似たようなことをvxlan使ってできないかなーと思っていたところに
「Connecting Docker containers between VMs with VXLAN」
http://blog.thestateofme.com/2014/06/08/connecting-docker-containers-between-vms-with-vxlan/
という記事を見つけたので、unicast vxlanを使ってEC2上でDockerコンテナのオーバーレイネットワーク
を実現してみます。


今回の構成は以下の通りです。
(後で自分で使いそうなAmazon Linux/ubuntu/CentOSを網羅する為に、以下の構成になってます)

ホスト名 OS(カーネル ローカルIP 実インターフェース brigde用IP コンテナ用IP
A Amazon linux 2014.09.2.x86_64(3.14.27-25.47.amzn1.x86_64) 172.31.2.203 eth0 192.168.0.11 192.168.0.101
B ubuntu-trusty-14.04-amd64-server-20150123 (3.13.0-44-generic) 172.31.15.69 eth0 192.168.0.12 192.168.0.102
C CentOS 7 x86_64 (2014_09_29) EBS (3.10.0-123.8.1.el7.x86_64) 172.31.4.212 eth0 192.168.0.13 192.168.0.103


ハマりどころとして、brigde fdb append にて自ホストIPアドレスを追加してしまうと
コンテナが他ホスト/コンテナと上手く通信できなくなりますので注意をしてください。


★ブリッジ+vxlanインターフェース作成

###nodeA 172.31.2.203 amazon linux###

sudo su -
yum -y install git bison flex libdb-devel db4-devel gcc docker bridge-utils
git clone git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
cd iproute2
./configure
make
make DESTDIR=/usr/share install

service docker start
chkconfig docker on

#delete docker0,vxlan10 if exists
/usr/share/sbin/ip link set docker0 down
brctl delbr docker0
/usr/share/sbin/ip link del vxlan10

#add docker brigde interface
brctl addbr docker0
#create virtual mac addr ref by http://d.ballade.jp/2008/03/vif-and-mac-address.html
vmac=`perl -e  'print sprintf("00:16:3e:%2.2x:%2.2x:%2.2x", rand()*255, rand()*255, rand()*255)'`
/usr/share/sbin/ip link set docker0 address $vmac
/usr/share/sbin/ip address add 192.168.0.11/24 dev docker0

#add vxlan interface
/usr/share/sbin/ip link add vxlan10 type vxlan id 10 ttl 4 dev eth0
#get unique mac addr
vmac=`perl -e  'print sprintf("00:16:3e:%2.2x:%2.2x:%2.2x", rand()*255, rand()*255, rand()*255)'`
/usr/share/sbin/ip link set vxlan10 address $vmac


brctl addif docker0 vxlan10
/usr/share/sbin/ip link set vxlan10 up
/usr/share/sbin/ip link set docker0 up

#add other hosts VTEP 
/usr/share/sbin/bridge fdb append 00:00:00:00:00:00 dev vxlan10 dst 172.31.15.69
/usr/share/sbin/bridge fdb append 00:00:00:00:00:00 dev vxlan10 dst 172.31.4.212


###nodeB 172.31.15.69 ubuntu###

sudo su -
apt-get -y install docker.io bridge-utils


#delete docker0,vxlan10 if exists
ip link set docker0 down
brctl delbr docker0
ip link del vxlan10

brctl addbr docker0

#create virtual mac addr ref by http://d.ballade.jp/2008/03/vif-and-mac-address.html
vmac=`perl -e  'print sprintf("00:16:3e:%2.2x:%2.2x:%2.2x", rand()*255, rand()*255, rand()*255)'`
ip link set docker0 address $vmac
ip address add 192.168.0.12/24 dev docker0

 
ip link add vxlan10 type vxlan id 10 ttl 4 dev eth0
vmac=`perl -e  'print sprintf("00:16:3e:%2.2x:%2.2x:%2.2x", rand()*255, rand()*255, rand()*255)'`
ip link set vxlan10 address $vmac

brctl addif docker0 vxlan10
ip link set vxlan10 up
ip link set docker0 up

bridge fdb append 00:00:00:00:00:00 dev vxlan10 dst 172.31.2.203
bridge fdb append 00:00:00:00:00:00 dev vxlan10 dst 172.31.4.212

###nodeC 172.31.4.212 centos ###

sudo su -

#kernel 3.10.0-123.8.1.el7.x86_64では、unicast vxlan使うとカーネルパニック起こすのでupdate
yum -y update kernel
yum -y install git bison flex libdb-devel db4-devel gcc docker bridge-utils
reboot

#リブート後再ログイン
sudo su -
service docker start
chkconfig docker on

#delete docker0,vxlan10 if exists
ip link set docker0 down
brctl delbr docker0
ip link del vxlan10

brctl addbr docker0
#create virtual mac addr ref by http://d.ballade.jp/2008/03/vif-and-mac-address.html
vmac=`perl -e  'print sprintf("00:16:3e:%2.2x:%2.2x:%2.2x", rand()*255, rand()*255, rand()*255)'`
ip link set docker0 address $vmac
ip address add 192.168.0.13/24 dev docker0

ip link add vxlan10 type vxlan id 10 ttl 4 dev eth0
vmac=`perl -e  'print sprintf("00:16:3e:%2.2x:%2.2x:%2.2x", rand()*255, rand()*255, rand()*255)'`
ip link set vxlan10 address $vmac

brctl addif docker0 vxlan10
ip link set vxlan10 up
ip link set docker0 up

bridge fdb append 00:00:00:00:00:00 dev vxlan10 dst 172.31.2.203
bridge fdb append 00:00:00:00:00:00 dev vxlan10 dst 172.31.15.69

★コンテナ作成+IPアドレスの割り当て

##nodeA 172.31.2.203 amazon###

gateway=`ip addr show docker0 | grep "inet " | awk -F '[/ ]' '{print $6}'`
id=`sudo docker run -i -t -d --net=none centos /bin/bash`
pid=`docker inspect --format {{.State.Pid}} ${id}`

# 起動したDockerコンテナのprocをnetnsにリンク
mkdir -p /var/run/netns
ln -s /proc/${pid}/ns/net /var/run/netns/${pid}

# vethペアを作成
ip link add veth1b type veth peer name veth1c

# vethペアの一方をovsBridgenに設定し起動
brctl addif docker0 veth1b
ip link set veth1b up

# vethペアの残りの一方をDockerコンテナにセットして起動
ip link set veth1c netns ${pid}
ip netns exec ${pid} ip link set dev veth1c name eth0
ip netns exec ${pid} ip link set eth0 up
#uniqueなMACアドレスをコンテナ内のeth0に割り当て
vmac=`perl -e  'print sprintf("00:16:3e:%2.2x:%2.2x:%2.2x", rand()*255, rand()*255, rand()*255)'`
ip netns exec ${pid} ip link set eth0 address $vmac
#IPアドレスの設定
ip netns exec ${pid} ip addr add 192.168.0.101/24 dev eth0
#ゲートウェイの設定
ip netns exec ${pid} ip route add default via $gateway


###nodeB 172.31.15.69 ubuntu###

gateway=`ip addr show docker0 | grep "inet " | awk -F '[/ ]' '{print $6}'`
id=`sudo docker run -i -t -d --net=none centos /bin/bash`
pid=`docker inspect --format {{.State.Pid}} ${id}`


# 起動したDockerコンテナのprocをnetnsにリンク
mkdir -p /var/run/netns
ln -s /proc/${pid}/ns/net /var/run/netns/${pid}

# vethペアを作成
ip link add veth1b type veth peer name veth1c

# vethペアの一方をovsBridgenに設定し起動
brctl addif docker0 veth1b
ip link set veth1b up

# vethペアの残りの一方をDockerコンテナにセットして起動
ip link set veth1c netns ${pid}
ip netns exec ${pid} ip link set dev veth1c name eth0
ip netns exec ${pid} ip link set eth0 up
vmac=`perl -e  'print sprintf("00:16:3e:%2.2x:%2.2x:%2.2x", rand()*255, rand()*255, rand()*255)'`
ip netns exec ${pid} ip link set eth0 address $vmac
ip netns exec ${pid} ip addr add 192.168.0.102/24 dev eth0
ip netns exec ${pid} ip route add default via $gateway

###nodeC 172.31.4.212 centos ###

gateway=`ip addr show docker0 | grep "inet " | awk -F '[/ ]' '{print $6}'`
id=`sudo docker run -i -t -d --net=none centos /bin/bash`
pid=`docker inspect --format {{.State.Pid}} ${id}`

# 起動したDockerコンテナのprocをnetnsにリンク
mkdir -p /var/run/netns
ln -s /proc/${pid}/ns/net /var/run/netns/${pid}

# vethペアを作成
ip link add veth1b type veth peer name veth1c

# vethペアの一方をovsBridgenに設定し起動
brctl addif docker0 veth1b
ip link set veth1b up

# vethペアの残りの一方をDockerコンテナにセットして起動
ip link set veth1c netns ${pid}
ip netns exec ${pid} ip link set dev veth1c name eth0
ip netns exec ${pid} ip link set eth0 up
vmac=`perl -e  'print sprintf("00:16:3e:%2.2x:%2.2x:%2.2x", rand()*255, rand()*255, rand()*255)'`
ip netns exec ${pid} ip link set eth0 address $vmac
ip netns exec ${pid} ip addr add 192.168.0.103/24 dev eth0
ip netns exec ${pid} ip route add default via $gateway

動作確認
docker attach XX後、、、

[root@6653ad538420 /]# ping 192.168.0.101
PING 192.168.0.101 (192.168.0.101) 56(84) bytes of data.
64 bytes from 192.168.0.101: icmp_seq=1 ttl=64 time=0.023 ms
64 bytes from 192.168.0.101: icmp_seq=2 ttl=64 time=0.034 ms
64 bytes from 192.168.0.101: icmp_seq=3 ttl=64 time=0.032 ms

^C

--- 192.168.0.101 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.023/0.029/0.034/0.007 ms

[root@6653ad538420 /]# ping 192.168.0.102
PING 192.168.0.102 (192.168.0.102) 56(84) bytes of data.
64 bytes from 192.168.0.102: icmp_seq=1 ttl=64 time=0.760 ms
64 bytes from 192.168.0.102: icmp_seq=2 ttl=64 time=0.809 ms
64 bytes from 192.168.0.102: icmp_seq=3 ttl=64 time=0.899 ms
64 bytes from 192.168.0.102: icmp_seq=4 ttl=64 time=0.656 ms

^C

--- 192.168.0.102 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.656/0.781/0.899/0.087 ms

 

[root@6653ad538420 /]# ping 192.168.0.103
PING 192.168.0.103 (192.168.0.103) 56(84) bytes of data.
64 bytes from 192.168.0.103: icmp_seq=1 ttl=64 time=0.969 ms
64 bytes from 192.168.0.103: icmp_seq=2 ttl=64 time=0.536 ms
64 bytes from 192.168.0.103: icmp_seq=3 ttl=64 time=0.596 ms
64 bytes from 192.168.0.103: icmp_seq=4 ttl=64 time=0.606 ms

^C

--- 192.168.0.103 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3001ms
rtt min/avg/max/mdev = 0.536/0.676/0.969/0.173 ms

参考URL
https://docs.docker.com/articles/networking/
http://blog.thestateofme.com/2014/06/08/connecting-docker-containers-between-vms-with-vxlan/
http://qiita.com/nmatsui/items/2fee1d4a526a6ba3c887
http://www.slideshare.net/cloudconductor/softlayer-summit-2015