MP System 部署

节点角色类型:

  • Kubespray Deployment Node

  • Kubernetes master node

  • Kubernetes worker node(5G CP)

  • Kubernetes worker node(5G UP)

单节点部署拓扑

资源参数

5GC AIO VM

  • Core:8U,16U(推荐)

  • Memory:16G,32G(推荐)

  • Disk:200G

GuestOS:Ubuntu 18.04

Kernel:4.15.0-64-generic

Python:3.6

User:root

Kubespray 定制版

基础设施软件

Deployment Tools:Kubespray 定制版

  • Ansible

  • Kubeadm

  • Helm/Charts

CaaS:Kubernetes 1.17

  • Runtime:Docker CE,Docker Storage Option:Overlay2

  • Image Registry:Harbor

  • CNI:

    • Flannel: Default CNI

    • Multus: Support second CNI for 5G NFs

      • SR-IOV: It is a Kubernetes device plug-in for discovering, advertising, and allocating SR-IOV network virtual function resources.

    • host-device: Use the physical interface directly.

  • macvlan: Configure multiple virtual network interfaces on a network interface on the host.

  • HTTP/HTTPS Access:Nginx Ingress

  • Storage Option:OpenEBS(https://openebs.io/)

日志审计

  • Elasticsearch

  • Fluentd

  • Kibana

监控告警

  • Prometheus

  • Node Exporter

  • Alert Manager

MP System 软件

MP System 组件

  • Orchestrator Database

  • Orchestrator

  • MP

Docker Images

  • 5g-orch-db.tar:Orchestrator DB Image(PostgreSQL)。

  • 5g-orch.tar:Orchestrator Image。

  • mp-0412.tar:MP Image。

  • licenseutil:获取部署节点序列号,用于生产 License。

Playbook

  • cluster.yml: Install and setup Kubernetes cluster including MP system.

  • scale.yml: Add more nodes to Kubernetes cluster.

  • remove-node.yml: Remove nodes from Kubernetes cluster.

  • reset.yml: Delete the Kubernetes clusters from all nodes.

Inventory

  • 5g_support: MP system installation hosts information and configuration.

Role

  • 5g_support: MP system installation is a standalone role putting into Kubespray.

CLIs

  • Run kubernetes cluster playbook

ansible-playbook -i inventory/5g_support/hosts.ini cluster.yml -b -v --private-key=~/.ssh/id_rsa
  • To scale out: Add one more node in the inventory/5g_support/hosts.ini

ansible-playbook -i inventory/5g_support/hosts.ini scale.yml -b -v --private-key=~/.ssh/id_rsa
  • To scale in: Remove one more node in the inventory/5g_support/hosts.ini

ansible-playbook -i inventory/5g_support/hosts.ini remove-node.yml -b -v --extra-vars "node=node4”
  • To reset cluster (DELETE EVERYTHING):

# If you need to reset the whole kubernetes clusters, cd into the kubespray directory previously cloned and reset it.
ansible-playbook -i inventory/5g_support/hosts.ini reset.yml -b -v --private-key=~/.ssh/id_rsa

部署执行

Pre-Deployment

Kubespray Deployment Node

  • 科学上网。

  • Generate and Copy the Key Pair to 5G Nodes

$ ssh-keygen
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@<5gcp-node_ip_address>
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@<5gup-node_ip_address>
  • 使用国内 APT 源:

$ cp /etc/apt/sources.list /etc/apt/sources.list.backup

$ cat > /etc/apt/sources.list <<EOF
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
EOF

$ apt-get update -y && apt-get upgrade -y
  • 使用国内 PIP 源:

$ mkdir ~/.pip

$ cat > ~/.pip/pip.conf << EOF 
[global]
trusted-host=mirrors.aliyun.com
index-url=https://mirrors.aliyun.com/pypi/simple/
EOF
  • 使用国内 Docker 镜像源:

# 安装 Docker CE
apt-get remove docker docker-engine docker-ce docker.io
apt-get install apt-transport-https ca-certificates curl software-properties-common -y
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
apt-get update -y
apt-get install docker-ce -y
systemctl enable docker && systemctl start docker && systemctl status docker

$ vi /etc/docker/daemon.json
{
  "registry-mirrors": [
    "https://hub-mirror.c.163.com",
    "https://mirror.baidubce.com"
  ]
}

systemctl daemon-reload && systemctl restart docker
  • 安装 Kubespray

# 获取 Kubespray 定制版
$ cd kubespray

# 单节点
$ git checkout 5g_support_singlenode/2.13
# 多节点
$ git checkout 5g_support/2.13

# 安装
$ pip3 install -r requirements.txt
  • 上传 MP Images

$ docker load --input 5g-orch-db.tar
$ docker load --input 5g-orch.tar
$ docker load --input mp-0412.tar

5G CP Node

NOTE:不要开启 HugePage,否则 Harbor Pods 会启动失败。

5G UP Node

NOTE:Kernel Version == 4.15.0。

  • Add vfio Module

# Add “vfio-pci” in /etc/modules.
$ cat /etc/modules
vfio-pci

# Add “iommu=pt intel_iommu=on” in /etc/default/grub
$ cat /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash iommu=pt intel_iommu=on"
GRUB_CMDLINE_LINUX=""
cgroup_enable=memory swapaccount=1 intel_iommu=on iommu=pt

# Update grub and Reboot Server.
$ sudo update-grub
$ reboot

$ lsmod | grep vfio
vfio_pci               45056  0
vfio_virqfd            16384  1 vfio_pci
vfio_iommu_type1       24576  0
vfio                   28672  3 vfio_iommu_type1,vfio_pci
irqbypass              16384  1 vfio_pcis

Install Kubernetes Clusters and MP System

  • Edit downalod.yml:使用 aliyun 镜像源。

$ vi kubespray/inventory/5g_support/group_vars/k8s-cluster/downalod.yml
...
# If there is internet connection in mainland china, set aliyun_enable to true
aliyun_enable: true
  • (for all-in-one installation)Edit Kubespray Inventory Configuration:

$ cat kubespray/inventory/5g_support/hosts.ini

# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
# ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value.

[all]
kube-cluster-1 ansible_host=172.18.22.220 etcd_member_name=etcd1

[kube-master]
kube-cluster-1

[kube-node]
kube-cluster-1

[k8s-cluster:children]
kube-master
kube-node

[etcd]
kube-cluster-1
  • (多节点部署)Edit Kubespray Inventory Configuration:

# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
# ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value.
[all]
node1 ansible_host=172.18.17.60 etcd_member_name=etcd1
node2 ansible_host=172.18.17.61 etcd_member_name=etcd2
node3 ansible_host=172.18.17.62 etcd_member_name=etcd3

[kube-master]
node1
node2

[etcd]
node1
node2
node3

[kube-node]
node1
node2
node3

[k8s-cluster:children]
kube-master
kube-node
  • (For all-in-one installation)Edit kubespray/roles/5g_support/tasks/main.yml:Change all storageClass=openebs-sc into openebs-hostpath. EXAMPLE:

- name: SMARTCITY | Install Harbor Package
	command: "{{ bin_dir }}/helm install sco-harbor {{ helm_home_dir }}/packages/ha rbor-1.1.1.tgz --set persistence.persistentVolumeClaim.registry.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.jobservice.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.database.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.redis.storageClass=openebs-hostpath"
  • Install Kubespray Cluster and MP system

$ cd kubespray
$ sudo ansible-playbook -i inventory/5g_support/hosts.ini cluster.yml -b -v --private-key=~/.ssh/id_rsa
  • If you have an insecure private registry for docker image, change docker registry option and change corresponding address in the following command:

ansible-playbook -i inventory/5g_support/hosts.ini --become --become-user=root cluster.yml -e '{"docker_insecure_registries":["172.18.22.220"]}' -vvv

Wait 10-20 minutes for ansible to finish installation. If some errors have occurred during the installation, you could reset the cluster, solve the issue and re-install again.

  • Pod 清单

$ kubectl get pods --all-namespaces
NAMESPACE            NAME                                                     READY   STATUS      RESTARTS   AGE
default              elasticsearch-master-0                                   0/1     Running     2          4d14h
default              es-curator-elasticsearch-curator-1623632400-hxljs        0/1     Completed   0          2d5h
default              es-curator-elasticsearch-curator-1623718800-wmf2w        0/1     Completed   0          29h
default              es-curator-elasticsearch-curator-1623805200-94647        0/1     Completed   0          3h37m
default              fluentd-46vts                                            1/1     Running     2          27h
default              harbor-harbor-core-69fd799fd4-5b7z8                      1/1     Running     7          23h
default              harbor-harbor-database-0                                 1/1     Running     2          23h
default              harbor-harbor-jobservice-788df69665-cjdp6                1/1     Running     6          23h
default              harbor-harbor-portal-78b68cfdb-sjb2b                     1/1     Running     2          4d13h
default              harbor-harbor-redis-0                                    1/1     Running     2          4d13h
default              harbor-harbor-registry-5554964584-q5kw4                  2/2     Running     4          4d13h
default              ingress-nginx-ingress-controller-6c748ccfb9-xdp2c        1/1     Running     2          4d14h
default              ingress-nginx-ingress-default-backend-76fb4dfd79-mtgd2   1/1     Running     2          28h
default              mp-9fcb8c7c4-7klsz                                       1/1     Running     2          22h
default              orch-7bcb9b8498-84whh                                    1/1     Running     5          22h
default              orch-db-7954664f45-j86cx                                 1/1     Running     2          3d1h
default              prom-adapter-prometheus-adapter-854457d445-9m8g7         1/1     Running     2          4d14h
default              prom-prometheus-kube-state-metrics-586fdb6d-dhpkl        1/1     Running     2          4d14h
default              prom-prometheus-node-exporter-lcsnp                      1/1     Running     2          4d14h
default              prom-prometheus-server-6c88db7cd7-6r7br                  2/2     Running     4          4d14h
kube-system          coredns-76798d84dd-xt4ll                                 1/1     Running     2          20h
kube-system          coredns-76798d84dd-zfwrj                                 0/1     Pending     0          20h
kube-system          dns-autoscaler-56549847b5-s8mmk                          1/1     Running     2          4d15h
kube-system          kube-apiserver-kube-cluster-1                            1/1     Running     2          4d15h
kube-system          kube-controller-manager-kube-cluster-1                   1/1     Running     5          4d15h
kube-system          kube-flannel-kxktp                                       1/1     Running     4          4d15h
kube-system          kube-multus-ds-amd64-rfdm8                               1/1     Running     2          4d15h
kube-system          kube-proxy-f6ncr                                         1/1     Running     2          20h
kube-system          kube-scheduler-kube-cluster-1                            1/1     Running     6          4d15h
kube-system          kube-sriov-device-plugin-amd64-9qmpn                     1/1     Running     2          4d15h
kube-system          kubernetes-dashboard-77475cf576-mhwv5                    1/1     Running     2          4d15h
kube-system          kubernetes-metrics-scraper-747b4fd5cd-4jtb5              1/1     Running     2          4d15h
kube-system          nodelocaldns-rqck7                                       1/1     Running     2          4d15h
local-path-storage   local-path-provisioner-7dfbb94d64-xdwbl                  1/1     Running     2          4d15h
openebs              cstor-disk-pool-0sls-68759bd4b4-dq5fq                    3/3     Running     6          4d14h
openebs              openebs-admission-server-db47b787f-kt75s                 1/1     Running     2          4d15h
openebs              openebs-apiserver-556ffff45c-6zk4s                       1/1     Running     7          4d15h
openebs              openebs-localpv-provisioner-c6bc845bb-c9svf              1/1     Running     3          4d15h
openebs              openebs-ndm-lgckq                                        1/1     Running     4          4d15h
openebs              openebs-ndm-operator-5f6c5497d7-zm85z                    1/1     Running     3          4d15h
openebs              openebs-provisioner-598c45dd4-2kgmv                      1/1     Running     2          4d15h
openebs              openebs-snapshot-operator-5f74599c8c-z4ffc               2/2     Running     4          4d15h

Post-Deployment

Create mp License

跑完 Playbook 之后会看见 orch、orch-db、mp Pods 仍然没有 running,因为需要注入 license.txt。

NOTE: The mp and orchestrator should run on the same node.

现象:

$ kubectl describe pod mp-9fcb8c7c4-mhl2q
...
MountVolume.SetUp failed for volume "sco-mp-license" : configmap "sco-mp-license" not found

解决:生产并导入许可证,然后重启 orch、orch-db、mp Pods。

  1. Send the string to Astri and Astri generate license.

$ sudo ./licenseutil
SGFyZHdhcmUgU2VyaWFsIG51bWJlcjpOb3QgU3BlY2lmaWVkLEN1cnJlbnRUaW1lOjIwMjEtMDYtMTUgMTQ6NTE6MzU=
  1. Save the license into license.txt and create configmap on master node.

$ cat license.txt
Company Name: 99cloud
Expiry Date: 2021-08-15 12:59:59.999
Licensed Servers: All
License Key: d5fM8RG/LzwtbKpg94i/MAFs5we36IDq+GZ3rfIUdCx3CNUlEpy9QJ5Ov+NwcNJ89dQJAhzfSAKI0PfnVTbrF68g1MJO+e8sTriROt+3n1v9QdpSiRis0PtSr7UR9lf6ao+xr5LbttlKjXs0fSFq5NBw3WsywiBNtNvG+9DiRgs=

$ kubectl create configmap sco-mp-license --from-file=license.txt

Expose Service to Outside Network

To expose orchestrator service, please run following command line on 5gcp-node. Please change <node_ip_address> to 5gcp-node IP address.

# ClusterIP
$ kubectl patch svc orch -p '{"spec": {"externalIPs": ["<node_ip_address>"]}}'

# NodePort
$ kubectl patch svc orch -p '{"spec": {"externalIPs": ["172.18.22.220"], "externalTrafficPolicy": "Cluster", "type": "NodePort", "ports": [{"name": "websocket", "port": 80, "protocol": "TCP", "targetPort": 80, "nodePort": 32614}]}}'

Login Web UI

  • Web UI:http://172.18.22.220:32614/sco/web/login

  • root/root

附录

Kubectl Configuration

On the target-node:

sudo mkdir -p $HOME/.kube
sudo cp -f /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Setup SRIOV for UPF Node

  • Setup VFs

# find which pci device support SRIOV
lspci -nnn | grep Ether

# Max Allowed VFs
cat /sys/class/net/<ifname>/device/sriov_totalvfs

# Set VF Number
echo 8 > /sys/class/net/<ifname>/device/sriov_numvfs

# Check VF
ip link show

# Find pci-addr of Interface
ethtool -i <ifname>
  • Edit Config Map

$ kubectl -n kube-system edit cm sriovdp-config

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures. #
apiVersion: v1

data: config.json: |
    {
    	"resourceList":
    		[
    			{
						"resourceName": "sriovnetwork",
						"selectors": {
							"vendors": ["8086"],
							"devices": ["1521"],
							"drivers": ["ixgbevf"]
						}
					}
				]
		}
  • Restart SRIOV Pods

kubectl -n kube-system get pod

# Check in Pod
printenv
PCIDEVICE_INTEL_COM_SRIOVNETWORK=0000:01:10.0

Setup Hugepage for UPF Node

$ cat /etc/sysctl.conf
...
vm.nr_hugepages = 512

$ sysctl -p

$ sysctl -w vm.nr_hugepages=512

$ cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:     512
HugePages_Free:      512
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         1048576 kB

Harbor 支持大页内存

$ kubectl get pv | grep harbor-database
pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c   1Gi        RWO            Delete           Bound    default/database-data-harbor-harbor-database-0        openebs-hostpath            3d16h

$ kubectl get pv pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c -o yaml | grep path
    openebs.io/cas-type: local-hostpath
    path: /var/openebs/local/pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c
  storageClassName: openebs-hostpath
  
$ cd /var/openebs/local/pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c

$ vi postgresql.conf
...
huge_pages = off                        # on, off, or try

$ 重启 Harbor  3  Pod

Build CP Docker Images

  • AMF

$ cat amf/Dockerfile

FROM ubuntu:18.04
LABEL version=1.0.2

RUN apt-get update --fix-missing && \
	apt-get install -y iproute2 apt-utils net-tools iputils-ping tcpdump iptables gdb vim dmidecode dnsutils curl && \
	apt-get clean

COPY lib /usr/local/lib/amf/lib
COPY log /var/log/amf
COPY amf /usr/local/bin/amf
COPY etc/amf /etc/amf

WORKDIR /etc/amf/

$ sudo docker build -t 192.168.205.17/5g/amf:1.0.2 amf/

Build UPF Docker Image

  • UPF

# 1. Into vpp source code, create container.
$ git clone git@gitlab.sh.99cloud.net:5GS/upf-astri.git
$ cd upf-astri
$ git checkout XXX
$ git tag -a v1.0 -m "test" # if no tags
$ ./buildpack container
a215761ad1f101536885d5554640f7428f2a8487b41222398e7d5010eb04cc07
root@a215761ad1f1:/opt#
$ docker ps
CONTAINER ID        IMAGE                     COMMAND             CREATED             STATUS              PORTS               NAMES
a478be4fec3a        fastgate/ubuntu1804:upf   "/bin/bash"         29 minutes ago      Up 29 minutes                           root_upf

# 2. Compile deb package in container.
$ mkdir -p build/external/downloads/
$ cp /downloads/*tar* build/external/downloads/
$ make pkg-deb
$ mkdir dep-deb
$ cp /dep-deb/*.deb dep-deb/
$ mv build-root/*.deb .
$ mv build/external/*.deb .
$ tar -czf upf_deb.tgz Makefile *.deb dep-deb/

# 3. Copy deb package of UPF (eg. upf_deb.tgz) into vpp repo and unzip the file.
$ cd vpp
$ docker cp root_upf:/opt/upf_deb.tgz .
$ tar -zxvf upf_deb.tgz

# 4. Build VPP image via Dockerfile into vpp/ folder.
$ cd vpp
$ sudo docker build -t 192.168.205.17/5g/upf:1.0.0 -f Dockerfile .
$ docker images
REPOSITORY              TAG                 IMAGE ID            CREATED             SIZE
192.168.205.17/5g/upf   1.0.0               f29a82c4ed86        11 seconds ago      936MB
  • VPP Agent

# 1. Into vpp-agent repo and run docker/astri/dev/start.sh to enter upf-agent-dev container.
git clone git@gitlab.sh.99cloud.net:5GS/vpp-agent.git
cd vpp-agent 
git checkout main

./docker/astri/dev/start.sh

# 2. To compile deb package in container. make cmd
make deb

# 3. Exit the container, copy the deb package into docker/astri/prod/ and build upf-agent image.
sudo cp docker/astri/*.deb docker/astri/prod/
cd docker/astri/prod
sudo docker build -t 192.168.205.17/5g/upf-agent22:1.0.0 -f Dockerfile .

Troubleshooting

Useful CLI in Troubleshooting

  • For each individual pod not up and running, try:

kubectl describe pod <pod_name>
  • For pod is up and running, however the there are some error internally, try to read the logs of pod:

kubectl logs -f <pod_name>
  • To restart the pod, simply run:

kubectl delete pod <pod_name>
  • To open an port on host for troubleshooting on service:

For example, if you want to access some service which is only internally accessible in the

kubernetes cluster, try:

kubectl edit svc <svc_name>

Then change the type of service from ClusterIP to NodePort, which kube-proxy will then create a host port direct to the service port on each host in the kubernetes.

You could use kubectl get svc to view the host port opened for that service.

问题 1:Ansible 执行失败

Could not detect which package manager to use. Try gathering facts or settin

解决:Ansible 统一使用 Python3。

$ python --version
Python 3.6.9

问题 2:Harbor Pod 启动失败

  • 现象

# kubectl get pods --all-namespaces | grep har
default              harbor-harbor-core-69fd799fd4-rlrd5                      0/1     CrashLoopBackOff    886        2d4h
default              harbor-harbor-database-0                                 0/1     CrashLoopBackOff    54         4h17m
default              harbor-harbor-jobservice-788df69665-4msxt                0/1     CrashLoopBackOff    835        2d4h
default              harbor-harbor-portal-78b68cfdb-sjb2b                     1/1     Running             0          3d13h
default              harbor-harbor-redis-0                                    1/1     Running             0          3d13h
default              harbor-harbor-registry-5554964584-q5kw4                  2/2     Running             0          3d13h
  • 原因:Harbor Database 使用了 PostgreSQL,默认是不支持 HugePage 的。

  • 解决:关闭 HugePage,然后重启 Pods。