MP System 部署

节点角色类型：

Kubespray Deployment Node
Kubernetes master node
Kubernetes worker node（5G CP）
Kubernetes worker node（5G UP）

单节点部署拓扑

资源参数

5GC AIO VM：

Core：8U，16U（推荐）
Memory：16G，32G（推荐）
Disk：200G

GuestOS：Ubuntu 18.04

Kernel：4.15.0-64-generic

Python：3.6

User：root

Kubespray 定制版

基础设施软件

Deployment Tools：Kubespray 定制版

Ansible
Kubeadm
Helm/Charts

CaaS：Kubernetes 1.17

Runtime：Docker CE，Docker Storage Option：Overlay2
Image Registry：Harbor
CNI：
- Flannel: Default CNI
- Multus: Support second CNI for 5G NFs
  - SR-IOV: It is a Kubernetes device plug-in for discovering, advertising, and allocating SR-IOV network virtual function resources.
- host-device: Use the physical interface directly.
macvlan: Configure multiple virtual network interfaces on a network interface on the host.
HTTP/HTTPS Access：Nginx Ingress
Storage Option：OpenEBS（https://openebs.io/）

日志审计：

Elasticsearch
Fluentd
Kibana

监控告警：

Prometheus
Node Exporter
Alert Manager

MP System 软件

MP System 组件：

Orchestrator Database
Orchestrator
MP

Docker Images：

5g-orch-db.tar：Orchestrator DB Image（PostgreSQL）。
5g-orch.tar：Orchestrator Image。
mp-0412.tar：MP Image。
licenseutil：获取部署节点序列号，用于生产 License。

Playbook：

cluster.yml: Install and setup Kubernetes cluster including MP system.
scale.yml: Add more nodes to Kubernetes cluster.
remove-node.yml: Remove nodes from Kubernetes cluster.
reset.yml: Delete the Kubernetes clusters from all nodes.

Inventory：

5g_support: MP system installation hosts information and configuration.

Role：

5g_support: MP system installation is a standalone role putting into Kubespray.

CLIs：

Run kubernetes cluster playbook

ansible-playbook -i inventory/5g_support/hosts.ini cluster.yml -b -v --private-key=~/.ssh/id_rsa

To scale out: Add one more node in the inventory/5g_support/hosts.ini

ansible-playbook -i inventory/5g_support/hosts.ini scale.yml -b -v --private-key=~/.ssh/id_rsa

To scale in: Remove one more node in the inventory/5g_support/hosts.ini

ansible-playbook -i inventory/5g_support/hosts.ini remove-node.yml -b -v --extra-vars "node=node4”

To reset cluster (DELETE EVERYTHING):

# If you need to reset the whole kubernetes clusters, cd into the kubespray directory previously cloned and reset it.
ansible-playbook -i inventory/5g_support/hosts.ini reset.yml -b -v --private-key=~/.ssh/id_rsa

部署执行

Pre-Deployment

Kubespray Deployment Node

科学上网。
Generate and Copy the Key Pair to 5G Nodes

$ ssh-keygen
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@<5gcp-node_ip_address>
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@<5gup-node_ip_address>

使用国内 APT 源：

$ cp /etc/apt/sources.list /etc/apt/sources.list.backup

$ cat > /etc/apt/sources.list <<EOF
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
EOF

$ apt-get update -y && apt-get upgrade -y

使用国内 PIP 源：

$ mkdir ~/.pip

$ cat > ~/.pip/pip.conf << EOF 
[global]
trusted-host=mirrors.aliyun.com
index-url=https://mirrors.aliyun.com/pypi/simple/
EOF

使用国内 Docker 镜像源：

# 安装 Docker CE
apt-get remove docker docker-engine docker-ce docker.io
apt-get install apt-transport-https ca-certificates curl software-properties-common -y
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
apt-get update -y
apt-get install docker-ce -y
systemctl enable docker && systemctl start docker && systemctl status docker

$ vi /etc/docker/daemon.json
{
  "registry-mirrors": [
    "https://hub-mirror.c.163.com",
    "https://mirror.baidubce.com"
  ]
}

systemctl daemon-reload && systemctl restart docker

安装 Kubespray

# 获取 Kubespray 定制版
$ cd kubespray

# 单节点
$ git checkout 5g_support_singlenode/2.13
# 多节点
$ git checkout 5g_support/2.13

# 安装
$ pip3 install -r requirements.txt

上传 MP Images

$ docker load --input 5g-orch-db.tar
$ docker load --input 5g-orch.tar
$ docker load --input mp-0412.tar

5G CP Node

NOTE：不要开启 HugePage，否则 Harbor Pods 会启动失败。

5G UP Node

NOTE：Kernel Version == 4.15.0。

Add vfio Module

# Add “vfio-pci” in /etc/modules.
$ cat /etc/modules
vfio-pci

# Add “iommu=pt intel_iommu=on” in /etc/default/grub
$ cat /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash iommu=pt intel_iommu=on"
GRUB_CMDLINE_LINUX=""
cgroup_enable=memory swapaccount=1 intel_iommu=on iommu=pt

# Update grub and Reboot Server.
$ sudo update-grub
$ reboot

$ lsmod | grep vfio
vfio_pci               45056  0
vfio_virqfd            16384  1 vfio_pci
vfio_iommu_type1       24576  0
vfio                   28672  3 vfio_iommu_type1,vfio_pci
irqbypass              16384  1 vfio_pcis

Install Kubernetes Clusters and MP System

Edit downalod.yml：使用 aliyun 镜像源。

$ vi kubespray/inventory/5g_support/group_vars/k8s-cluster/downalod.yml
...
# If there is internet connection in mainland china, set aliyun_enable to true
aliyun_enable: true

（for all-in-one installation）Edit Kubespray Inventory Configuration：

$ cat kubespray/inventory/5g_support/hosts.ini

# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
# ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value.

[all]
kube-cluster-1 ansible_host=172.18.22.220 etcd_member_name=etcd1

[kube-master]
kube-cluster-1

[kube-node]
kube-cluster-1

[k8s-cluster:children]
kube-master
kube-node

[etcd]
kube-cluster-1

（多节点部署）Edit Kubespray Inventory Configuration：

# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
# ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value.
[all]
node1 ansible_host=172.18.17.60 etcd_member_name=etcd1
node2 ansible_host=172.18.17.61 etcd_member_name=etcd2
node3 ansible_host=172.18.17.62 etcd_member_name=etcd3

[kube-master]
node1
node2

[etcd]
node1
node2
node3

[kube-node]
node1
node2
node3

[k8s-cluster:children]
kube-master
kube-node

（For all-in-one installation）Edit kubespray/roles/5g_support/tasks/main.yml：Change all storageClass=openebs-sc into openebs-hostpath. EXAMPLE：

- name: SMARTCITY | Install Harbor Package
	command: "{{ bin_dir }}/helm install sco-harbor {{ helm_home_dir }}/packages/ha rbor-1.1.1.tgz --set persistence.persistentVolumeClaim.registry.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.jobservice.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.database.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.redis.storageClass=openebs-hostpath"

Install Kubespray Cluster and MP system

$ cd kubespray
$ sudo ansible-playbook -i inventory/5g_support/hosts.ini cluster.yml -b -v --private-key=~/.ssh/id_rsa

If you have an insecure private registry for docker image, change docker registry option and change corresponding address in the following command:

ansible-playbook -i inventory/5g_support/hosts.ini --become --become-user=root cluster.yml -e '{"docker_insecure_registries":["172.18.22.220"]}' -vvv

Wait 10-20 minutes for ansible to finish installation. If some errors have occurred during the installation, you could reset the cluster, solve the issue and re-install again.

Pod 清单

$ kubectl get pods --all-namespaces
NAMESPACE            NAME                                                     READY   STATUS      RESTARTS   AGE
default              elasticsearch-master-0                                   0/1     Running     2          4d14h
default              es-curator-elasticsearch-curator-1623632400-hxljs        0/1     Completed   0          2d5h
default              es-curator-elasticsearch-curator-1623718800-wmf2w        0/1     Completed   0          29h
default              es-curator-elasticsearch-curator-1623805200-94647        0/1     Completed   0          3h37m
default              fluentd-46vts                                            1/1     Running     2          27h
default              harbor-harbor-core-69fd799fd4-5b7z8                      1/1     Running     7          23h
default              harbor-harbor-database-0                                 1/1     Running     2          23h
default              harbor-harbor-jobservice-788df69665-cjdp6                1/1     Running     6          23h
default              harbor-harbor-portal-78b68cfdb-sjb2b                     1/1     Running     2          4d13h
default              harbor-harbor-redis-0                                    1/1     Running     2          4d13h
default              harbor-harbor-registry-5554964584-q5kw4                  2/2     Running     4          4d13h
default              ingress-nginx-ingress-controller-6c748ccfb9-xdp2c        1/1     Running     2          4d14h
default              ingress-nginx-ingress-default-backend-76fb4dfd79-mtgd2   1/1     Running     2          28h
default              mp-9fcb8c7c4-7klsz                                       1/1     Running     2          22h
default              orch-7bcb9b8498-84whh                                    1/1     Running     5          22h
default              orch-db-7954664f45-j86cx                                 1/1     Running     2          3d1h
default              prom-adapter-prometheus-adapter-854457d445-9m8g7         1/1     Running     2          4d14h
default              prom-prometheus-kube-state-metrics-586fdb6d-dhpkl        1/1     Running     2          4d14h
default              prom-prometheus-node-exporter-lcsnp                      1/1     Running     2          4d14h
default              prom-prometheus-server-6c88db7cd7-6r7br                  2/2     Running     4          4d14h
kube-system          coredns-76798d84dd-xt4ll                                 1/1     Running     2          20h
kube-system          coredns-76798d84dd-zfwrj                                 0/1     Pending     0          20h
kube-system          dns-autoscaler-56549847b5-s8mmk                          1/1     Running     2          4d15h
kube-system          kube-apiserver-kube-cluster-1                            1/1     Running     2          4d15h
kube-system          kube-controller-manager-kube-cluster-1                   1/1     Running     5          4d15h
kube-system          kube-flannel-kxktp                                       1/1     Running     4          4d15h
kube-system          kube-multus-ds-amd64-rfdm8                               1/1     Running     2          4d15h
kube-system          kube-proxy-f6ncr                                         1/1     Running     2          20h
kube-system          kube-scheduler-kube-cluster-1                            1/1     Running     6          4d15h
kube-system          kube-sriov-device-plugin-amd64-9qmpn                     1/1     Running     2          4d15h
kube-system          kubernetes-dashboard-77475cf576-mhwv5                    1/1     Running     2          4d15h
kube-system          kubernetes-metrics-scraper-747b4fd5cd-4jtb5              1/1     Running     2          4d15h
kube-system          nodelocaldns-rqck7                                       1/1     Running     2          4d15h
local-path-storage   local-path-provisioner-7dfbb94d64-xdwbl                  1/1     Running     2          4d15h
openebs              cstor-disk-pool-0sls-68759bd4b4-dq5fq                    3/3     Running     6          4d14h
openebs              openebs-admission-server-db47b787f-kt75s                 1/1     Running     2          4d15h
openebs              openebs-apiserver-556ffff45c-6zk4s                       1/1     Running     7          4d15h
openebs              openebs-localpv-provisioner-c6bc845bb-c9svf              1/1     Running     3          4d15h
openebs              openebs-ndm-lgckq                                        1/1     Running     4          4d15h
openebs              openebs-ndm-operator-5f6c5497d7-zm85z                    1/1     Running     3          4d15h
openebs              openebs-provisioner-598c45dd4-2kgmv                      1/1     Running     2          4d15h
openebs              openebs-snapshot-operator-5f74599c8c-z4ffc               2/2     Running     4          4d15h

Post-Deployment

Create mp License

跑完 Playbook 之后会看见 orch、orch-db、mp Pods 仍然没有 running，因为需要注入 license.txt。

NOTE: The mp and orchestrator should run on the same node.

现象：

$ kubectl describe pod mp-9fcb8c7c4-mhl2q
...
MountVolume.SetUp failed for volume "sco-mp-license" : configmap "sco-mp-license" not found

解决：生产并导入许可证，然后重启 orch、orch-db、mp Pods。

Send the string to Astri and Astri generate license.

$ sudo ./licenseutil
SGFyZHdhcmUgU2VyaWFsIG51bWJlcjpOb3QgU3BlY2lmaWVkLEN1cnJlbnRUaW1lOjIwMjEtMDYtMTUgMTQ6NTE6MzU=

Save the license into license.txt and create configmap on master node.

$ cat license.txt
Company Name: 99cloud
Expiry Date: 2021-08-15 12:59:59.999
Licensed Servers: All
License Key: d5fM8RG/LzwtbKpg94i/MAFs5we36IDq+GZ3rfIUdCx3CNUlEpy9QJ5Ov+NwcNJ89dQJAhzfSAKI0PfnVTbrF68g1MJO+e8sTriROt+3n1v9QdpSiRis0PtSr7UR9lf6ao+xr5LbttlKjXs0fSFq5NBw3WsywiBNtNvG+9DiRgs=

$ kubectl create configmap sco-mp-license --from-file=license.txt

Expose Service to Outside Network

To expose orchestrator service, please run following command line on 5gcp-node. Please change <node_ip_address> to 5gcp-node IP address.

# ClusterIP
$ kubectl patch svc orch -p '{"spec": {"externalIPs": ["<node_ip_address>"]}}'

# NodePort
$ kubectl patch svc orch -p '{"spec": {"externalIPs": ["172.18.22.220"], "externalTrafficPolicy": "Cluster", "type": "NodePort", "ports": [{"name": "websocket", "port": 80, "protocol": "TCP", "targetPort": 80, "nodePort": 32614}]}}'

附录

Kubectl Configuration

On the target-node:

sudo mkdir -p $HOME/.kube
sudo cp -f /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Setup SRIOV for UPF Node

Setup VFs

# find which pci device support SRIOV
lspci -nnn | grep Ether

# Max Allowed VFs
cat /sys/class/net/<ifname>/device/sriov_totalvfs

# Set VF Number
echo 8 > /sys/class/net/<ifname>/device/sriov_numvfs

# Check VF
ip link show

# Find pci-addr of Interface
ethtool -i <ifname>

Edit Config Map

$ kubectl -n kube-system edit cm sriovdp-config

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures. #
apiVersion: v1

data: config.json: |
    {
    	"resourceList":
    		[
    			{
						"resourceName": "sriovnetwork",
						"selectors": {
							"vendors": ["8086"],
							"devices": ["1521"],
							"drivers": ["ixgbevf"]
						}
					}
				]
		}

Restart SRIOV Pods

kubectl -n kube-system get pod

# Check in Pod
printenv
PCIDEVICE_INTEL_COM_SRIOVNETWORK=0000:01:10.0

Setup Hugepage for UPF Node

$ cat /etc/sysctl.conf
...
vm.nr_hugepages = 512

$ sysctl -p

$ sysctl -w vm.nr_hugepages=512

$ cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:     512
HugePages_Free:      512
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         1048576 kB

Harbor 支持大页内存

$ kubectl get pv | grep harbor-database
pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c   1Gi        RWO            Delete           Bound    default/database-data-harbor-harbor-database-0        openebs-hostpath            3d16h

$ kubectl get pv pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c -o yaml | grep path
    openebs.io/cas-type: local-hostpath
    path: /var/openebs/local/pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c
  storageClassName: openebs-hostpath
  
$ cd /var/openebs/local/pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c

$ vi postgresql.conf
...
huge_pages = off                        # on, off, or try

$ 重启 Harbor 的 3 个 Pod

Build CP Docker Images

AMF

$ cat amf/Dockerfile

FROM ubuntu:18.04
LABEL version=1.0.2

RUN apt-get update --fix-missing && \
	apt-get install -y iproute2 apt-utils net-tools iputils-ping tcpdump iptables gdb vim dmidecode dnsutils curl && \
	apt-get clean

COPY lib /usr/local/lib/amf/lib
COPY log /var/log/amf
COPY amf /usr/local/bin/amf
COPY etc/amf /etc/amf

WORKDIR /etc/amf/

$ sudo docker build -t 192.168.205.17/5g/amf:1.0.2 amf/

Build UPF Docker Image

UPF

# 1. Into vpp source code, create container.
$ git clone git@gitlab.sh.99cloud.net:5GS/upf-astri.git
$ cd upf-astri
$ git checkout XXX
$ git tag -a v1.0 -m "test" # if no tags
$ ./buildpack container
a215761ad1f101536885d5554640f7428f2a8487b41222398e7d5010eb04cc07
root@a215761ad1f1:/opt#
$ docker ps
CONTAINER ID        IMAGE                     COMMAND             CREATED             STATUS              PORTS               NAMES
a478be4fec3a        fastgate/ubuntu1804:upf   "/bin/bash"         29 minutes ago      Up 29 minutes                           root_upf

# 2. Compile deb package in container.
$ mkdir -p build/external/downloads/
$ cp /downloads/*tar* build/external/downloads/
$ make pkg-deb
$ mkdir dep-deb
$ cp /dep-deb/*.deb dep-deb/
$ mv build-root/*.deb .
$ mv build/external/*.deb .
$ tar -czf upf_deb.tgz Makefile *.deb dep-deb/

# 3. Copy deb package of UPF (eg. upf_deb.tgz) into vpp repo and unzip the file.
$ cd vpp
$ docker cp root_upf:/opt/upf_deb.tgz .
$ tar -zxvf upf_deb.tgz

# 4. Build VPP image via Dockerfile into vpp/ folder.
$ cd vpp
$ sudo docker build -t 192.168.205.17/5g/upf:1.0.0 -f Dockerfile .
$ docker images
REPOSITORY              TAG                 IMAGE ID            CREATED             SIZE
192.168.205.17/5g/upf   1.0.0               f29a82c4ed86        11 seconds ago      936MB

VPP Agent

# 1. Into vpp-agent repo and run docker/astri/dev/start.sh to enter upf-agent-dev container.
git clone git@gitlab.sh.99cloud.net:5GS/vpp-agent.git
cd vpp-agent 
git checkout main

./docker/astri/dev/start.sh

# 2. To compile deb package in container. make cmd
make deb

# 3. Exit the container, copy the deb package into docker/astri/prod/ and build upf-agent image.
sudo cp docker/astri/*.deb docker/astri/prod/
cd docker/astri/prod
sudo docker build -t 192.168.205.17/5g/upf-agent22:1.0.0 -f Dockerfile .

Troubleshooting

Useful CLI in Troubleshooting

For each individual pod not up and running, try:

kubectl describe pod <pod_name>

For pod is up and running, however the there are some error internally, try to read the logs of pod:

kubectl logs -f <pod_name>

To restart the pod, simply run:

kubectl delete pod <pod_name>

To open an port on host for troubleshooting on service:

For example, if you want to access some service which is only internally accessible in the

kubernetes cluster, try:

kubectl edit svc <svc_name>

Then change the type of service from ClusterIP to NodePort, which kube-proxy will then create a host port direct to the service port on each host in the kubernetes.

You could use kubectl get svc to view the host port opened for that service.

问题 1：Ansible 执行失败

Could not detect which package manager to use. Try gathering facts or settin

解决：Ansible 统一使用 Python3。

$ python --version
Python 3.6.9

问题 2：Harbor Pod 启动失败

现象

# kubectl get pods --all-namespaces | grep har
default              harbor-harbor-core-69fd799fd4-rlrd5                      0/1     CrashLoopBackOff    886        2d4h
default              harbor-harbor-database-0                                 0/1     CrashLoopBackOff    54         4h17m
default              harbor-harbor-jobservice-788df69665-4msxt                0/1     CrashLoopBackOff    835        2d4h
default              harbor-harbor-portal-78b68cfdb-sjb2b                     1/1     Running             0          3d13h
default              harbor-harbor-redis-0                                    1/1     Running             0          3d13h
default              harbor-harbor-registry-5554964584-q5kw4                  2/2     Running             0          3d13h

原因：Harbor Database 使用了 PostgreSQL，默认是不支持 HugePage 的。
解决：关闭 HugePage，然后重启 Pods。