# MP System 部署

节点角色类型：

- Kubespray Deployment Node
- Kubernetes master node
- Kubernetes worker node（5G CP）
- Kubernetes worker node（5G UP）

## 单节点部署拓扑

### 资源参数

**5GC AIO VM**：

- Core：8U，16U（推荐）
- Memory：16G，32G（推荐）
- Disk：200G

**GuestOS**：Ubuntu 18.04

**Kernel**：4.15.0-64-generic

**Python**：3.6

**User**：root

## Kubespray 定制版

### 基础设施软件

**Deployment Tools**：Kubespray 定制版

- Ansible
- Kubeadm
- Helm/Charts

**CaaS**：Kubernetes 1.17

- Runtime：Docker CE，Docker Storage Option：Overlay2
- Image Registry：Harbor

- CNI：
  - Flannel: Default CNI
  - Multus: Support second CNI for 5G NFs
    - SR-IOV: It is a Kubernetes device plug-in for discovering, advertising, and allocating SR-IOV network virtual function resources.
  - host-device: Use the physical interface directly.
- macvlan: Configure multiple virtual network interfaces on a network interface on the host.
  
- HTTP/HTTPS Access：Nginx Ingress

- Storage Option：OpenEBS（https://openebs.io/）

**日志审计**：

- Elasticsearch
- Fluentd
- Kibana

**监控告警**：

- Prometheus
- Node Exporter
- Alert Manager

### MP System 软件

**MP System 组件**：

- Orchestrator Database
- Orchestrator
- MP

**Docker Images**：

- 5g-orch-db.tar：Orchestrator DB Image（PostgreSQL）。
- 5g-orch.tar：Orchestrator Image。
- mp-0412.tar：MP Image。
- licenseutil：获取部署节点序列号，用于生产 License。

**Playbook**：

- cluster.yml: Install and setup Kubernetes cluster including MP system.
- scale.yml: Add more nodes to Kubernetes cluster.
- remove-node.yml: Remove nodes from Kubernetes cluster.
- reset.yml: Delete the Kubernetes clusters from all nodes.

**Inventory**：

- 5g_support: MP system installation hosts information and configuration. 

**Role**：

- 5g_support: MP system installation is a standalone role putting into Kubespray.

**CLIs**：

- Run kubernetes cluster playbook

```bash
ansible-playbook -i inventory/5g_support/hosts.ini cluster.yml -b -v --private-key=~/.ssh/id_rsa
```

- To scale out: Add one more node in the inventory/5g_support/hosts.ini

```bash
ansible-playbook -i inventory/5g_support/hosts.ini scale.yml -b -v --private-key=~/.ssh/id_rsa
```

- To scale in: Remove one more node in the inventory/5g_support/hosts.ini

```bash
ansible-playbook -i inventory/5g_support/hosts.ini remove-node.yml -b -v --extra-vars "node=node4”
```

- To reset cluster (DELETE EVERYTHING):

```bash
# If you need to reset the whole kubernetes clusters, cd into the kubespray directory previously cloned and reset it.
ansible-playbook -i inventory/5g_support/hosts.ini reset.yml -b -v --private-key=~/.ssh/id_rsa
```

## 部署执行

### Pre-Deployment

#### Kubespray Deployment Node

- 科学上网。

- Generate and Copy the Key Pair to 5G Nodes

```bash
$ ssh-keygen
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@<5gcp-node_ip_address>
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@<5gup-node_ip_address>
```

- 使用国内 APT 源：

```bash
$ cp /etc/apt/sources.list /etc/apt/sources.list.backup

$ cat > /etc/apt/sources.list <<EOF
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
EOF

$ apt-get update -y && apt-get upgrade -y
```

- 使用国内 PIP 源：

```bash
$ mkdir ~/.pip

$ cat > ~/.pip/pip.conf << EOF 
[global]
trusted-host=mirrors.aliyun.com
index-url=https://mirrors.aliyun.com/pypi/simple/
EOF
```

- 使用国内 Docker 镜像源：

```bash
# 安装 Docker CE
apt-get remove docker docker-engine docker-ce docker.io
apt-get install apt-transport-https ca-certificates curl software-properties-common -y
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
apt-get update -y
apt-get install docker-ce -y
systemctl enable docker && systemctl start docker && systemctl status docker

$ vi /etc/docker/daemon.json
{
  "registry-mirrors": [
    "https://hub-mirror.c.163.com",
    "https://mirror.baidubce.com"
  ]
}

systemctl daemon-reload && systemctl restart docker
```

- 安装 Kubespray

```bash
# 获取 Kubespray 定制版
$ cd kubespray

# 单节点
$ git checkout 5g_support_singlenode/2.13
# 多节点
$ git checkout 5g_support/2.13

# 安装
$ pip3 install -r requirements.txt
```

- 上传 MP Images

```bash
$ docker load --input 5g-orch-db.tar
$ docker load --input 5g-orch.tar
$ docker load --input mp-0412.tar
```

#### 5G CP Node

**NOTE**：不要开启 HugePage，否则 Harbor Pods 会启动失败。

#### 5G UP Node

NOTE：Kernel Version == 4.15.0。

- Add vfio Module

```bash
# Add “vfio-pci” in /etc/modules.
$ cat /etc/modules
vfio-pci

# Add “iommu=pt intel_iommu=on” in /etc/default/grub
$ cat /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash iommu=pt intel_iommu=on"
GRUB_CMDLINE_LINUX=""
cgroup_enable=memory swapaccount=1 intel_iommu=on iommu=pt

# Update grub and Reboot Server.
$ sudo update-grub
$ reboot

$ lsmod | grep vfio
vfio_pci               45056  0
vfio_virqfd            16384  1 vfio_pci
vfio_iommu_type1       24576  0
vfio                   28672  3 vfio_iommu_type1,vfio_pci
irqbypass              16384  1 vfio_pcis
```

### Install Kubernetes Clusters and MP System

- Edit downalod.yml：使用 aliyun 镜像源。

```bash
$ vi kubespray/inventory/5g_support/group_vars/k8s-cluster/downalod.yml
...
# If there is internet connection in mainland china, set aliyun_enable to true
aliyun_enable: true
```

- （for all-in-one installation）Edit Kubespray Inventory Configuration：

```bash
$ cat kubespray/inventory/5g_support/hosts.ini

# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
# ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value.

[all]
kube-cluster-1 ansible_host=172.18.22.220 etcd_member_name=etcd1

[kube-master]
kube-cluster-1

[kube-node]
kube-cluster-1

[k8s-cluster:children]
kube-master
kube-node

[etcd]
kube-cluster-1
```

- （多节点部署）Edit Kubespray Inventory Configuration：

```bash
# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
# ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value.
[all]
node1 ansible_host=172.18.17.60 etcd_member_name=etcd1
node2 ansible_host=172.18.17.61 etcd_member_name=etcd2
node3 ansible_host=172.18.17.62 etcd_member_name=etcd3

[kube-master]
node1
node2

[etcd]
node1
node2
node3

[kube-node]
node1
node2
node3

[k8s-cluster:children]
kube-master
kube-node
```

- （For all-in-one installation）Edit `kubespray/roles/5g_support/tasks/main.yml`：Change all `storageClass=openebs-sc` into `openebs-hostpath`. EXAMPLE：

```bash
- name: SMARTCITY | Install Harbor Package
	command: "{{ bin_dir }}/helm install sco-harbor {{ helm_home_dir }}/packages/ha rbor-1.1.1.tgz --set persistence.persistentVolumeClaim.registry.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.jobservice.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.database.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.redis.storageClass=openebs-hostpath"
```

- Install Kubespray Cluster and MP system

```bash
$ cd kubespray
$ sudo ansible-playbook -i inventory/5g_support/hosts.ini cluster.yml -b -v --private-key=~/.ssh/id_rsa
```

- If you have an insecure private registry for docker image, change docker registry option and change corresponding address in the following command:

```bash
ansible-playbook -i inventory/5g_support/hosts.ini --become --become-user=root cluster.yml -e '{"docker_insecure_registries":["172.18.22.220"]}' -vvv
```

Wait 10-20 minutes for ansible to finish installation. If some errors have occurred during the installation, you could reset the cluster, solve the issue and re-install again.

- Pod 清单

```bash
$ kubectl get pods --all-namespaces
NAMESPACE            NAME                                                     READY   STATUS      RESTARTS   AGE
default              elasticsearch-master-0                                   0/1     Running     2          4d14h
default              es-curator-elasticsearch-curator-1623632400-hxljs        0/1     Completed   0          2d5h
default              es-curator-elasticsearch-curator-1623718800-wmf2w        0/1     Completed   0          29h
default              es-curator-elasticsearch-curator-1623805200-94647        0/1     Completed   0          3h37m
default              fluentd-46vts                                            1/1     Running     2          27h
default              harbor-harbor-core-69fd799fd4-5b7z8                      1/1     Running     7          23h
default              harbor-harbor-database-0                                 1/1     Running     2          23h
default              harbor-harbor-jobservice-788df69665-cjdp6                1/1     Running     6          23h
default              harbor-harbor-portal-78b68cfdb-sjb2b                     1/1     Running     2          4d13h
default              harbor-harbor-redis-0                                    1/1     Running     2          4d13h
default              harbor-harbor-registry-5554964584-q5kw4                  2/2     Running     4          4d13h
default              ingress-nginx-ingress-controller-6c748ccfb9-xdp2c        1/1     Running     2          4d14h
default              ingress-nginx-ingress-default-backend-76fb4dfd79-mtgd2   1/1     Running     2          28h
default              mp-9fcb8c7c4-7klsz                                       1/1     Running     2          22h
default              orch-7bcb9b8498-84whh                                    1/1     Running     5          22h
default              orch-db-7954664f45-j86cx                                 1/1     Running     2          3d1h
default              prom-adapter-prometheus-adapter-854457d445-9m8g7         1/1     Running     2          4d14h
default              prom-prometheus-kube-state-metrics-586fdb6d-dhpkl        1/1     Running     2          4d14h
default              prom-prometheus-node-exporter-lcsnp                      1/1     Running     2          4d14h
default              prom-prometheus-server-6c88db7cd7-6r7br                  2/2     Running     4          4d14h
kube-system          coredns-76798d84dd-xt4ll                                 1/1     Running     2          20h
kube-system          coredns-76798d84dd-zfwrj                                 0/1     Pending     0          20h
kube-system          dns-autoscaler-56549847b5-s8mmk                          1/1     Running     2          4d15h
kube-system          kube-apiserver-kube-cluster-1                            1/1     Running     2          4d15h
kube-system          kube-controller-manager-kube-cluster-1                   1/1     Running     5          4d15h
kube-system          kube-flannel-kxktp                                       1/1     Running     4          4d15h
kube-system          kube-multus-ds-amd64-rfdm8                               1/1     Running     2          4d15h
kube-system          kube-proxy-f6ncr                                         1/1     Running     2          20h
kube-system          kube-scheduler-kube-cluster-1                            1/1     Running     6          4d15h
kube-system          kube-sriov-device-plugin-amd64-9qmpn                     1/1     Running     2          4d15h
kube-system          kubernetes-dashboard-77475cf576-mhwv5                    1/1     Running     2          4d15h
kube-system          kubernetes-metrics-scraper-747b4fd5cd-4jtb5              1/1     Running     2          4d15h
kube-system          nodelocaldns-rqck7                                       1/1     Running     2          4d15h
local-path-storage   local-path-provisioner-7dfbb94d64-xdwbl                  1/1     Running     2          4d15h
openebs              cstor-disk-pool-0sls-68759bd4b4-dq5fq                    3/3     Running     6          4d14h
openebs              openebs-admission-server-db47b787f-kt75s                 1/1     Running     2          4d15h
openebs              openebs-apiserver-556ffff45c-6zk4s                       1/1     Running     7          4d15h
openebs              openebs-localpv-provisioner-c6bc845bb-c9svf              1/1     Running     3          4d15h
openebs              openebs-ndm-lgckq                                        1/1     Running     4          4d15h
openebs              openebs-ndm-operator-5f6c5497d7-zm85z                    1/1     Running     3          4d15h
openebs              openebs-provisioner-598c45dd4-2kgmv                      1/1     Running     2          4d15h
openebs              openebs-snapshot-operator-5f74599c8c-z4ffc               2/2     Running     4          4d15h
```


### Post-Deployment

#### Create mp License

跑完 Playbook 之后会看见 orch、orch-db、mp Pods 仍然没有 running，因为需要注入 license.txt。

NOTE: The mp and orchestrator should run on the same node.

现象：

```bash
$ kubectl describe pod mp-9fcb8c7c4-mhl2q
...
MountVolume.SetUp failed for volume "sco-mp-license" : configmap "sco-mp-license" not found
```

解决：生产并导入许可证，然后重启 orch、orch-db、mp Pods。

1. Send the string to Astri and Astri generate license.

```bash
$ sudo ./licenseutil
SGFyZHdhcmUgU2VyaWFsIG51bWJlcjpOb3QgU3BlY2lmaWVkLEN1cnJlbnRUaW1lOjIwMjEtMDYtMTUgMTQ6NTE6MzU=
```

2. Save the license into license.txt and create configmap on master node.

```bash
$ cat license.txt
Company Name: 99cloud
Expiry Date: 2021-08-15 12:59:59.999
Licensed Servers: All
License Key: d5fM8RG/LzwtbKpg94i/MAFs5we36IDq+GZ3rfIUdCx3CNUlEpy9QJ5Ov+NwcNJ89dQJAhzfSAKI0PfnVTbrF68g1MJO+e8sTriROt+3n1v9QdpSiRis0PtSr7UR9lf6ao+xr5LbttlKjXs0fSFq5NBw3WsywiBNtNvG+9DiRgs=

$ kubectl create configmap sco-mp-license --from-file=license.txt
```

#### Expose Service to Outside Network

To expose orchestrator service, please run following command line on 5gcp-node. Please change `<node_ip_address>` to 5gcp-node IP address.

```bash
# ClusterIP
$ kubectl patch svc orch -p '{"spec": {"externalIPs": ["<node_ip_address>"]}}'

# NodePort
$ kubectl patch svc orch -p '{"spec": {"externalIPs": ["172.18.22.220"], "externalTrafficPolicy": "Cluster", "type": "NodePort", "ports": [{"name": "websocket", "port": 80, "protocol": "TCP", "targetPort": 80, "nodePort": 32614}]}}'
```

### Login Web UI

- Web UI：http://172.18.22.220:32614/sco/web/login
- root/root

## 附录

### Kubectl Configuration

On the target-node:

```bash
sudo mkdir -p $HOME/.kube
sudo cp -f /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
```

### Setup SRIOV for UPF Node

- Setup VFs

```bash
# find which pci device support SRIOV
lspci -nnn | grep Ether

# Max Allowed VFs
cat /sys/class/net/<ifname>/device/sriov_totalvfs

# Set VF Number
echo 8 > /sys/class/net/<ifname>/device/sriov_numvfs

# Check VF
ip link show

# Find pci-addr of Interface
ethtool -i <ifname>
```

- Edit Config Map

```bash
$ kubectl -n kube-system edit cm sriovdp-config

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures. #
apiVersion: v1

data: config.json: |
    {
    	"resourceList":
    		[
    			{
						"resourceName": "sriovnetwork",
						"selectors": {
							"vendors": ["8086"],
							"devices": ["1521"],
							"drivers": ["ixgbevf"]
						}
					}
				]
		}
```

- Restart SRIOV Pods

```bash
kubectl -n kube-system get pod

# Check in Pod
printenv
PCIDEVICE_INTEL_COM_SRIOVNETWORK=0000:01:10.0
```

### Setup Hugepage for UPF Node

```bash
$ cat /etc/sysctl.conf
...
vm.nr_hugepages = 512

$ sysctl -p

$ sysctl -w vm.nr_hugepages=512

$ cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:     512
HugePages_Free:      512
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         1048576 kB
```

### Harbor 支持大页内存

```bash
$ kubectl get pv | grep harbor-database
pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c   1Gi        RWO            Delete           Bound    default/database-data-harbor-harbor-database-0        openebs-hostpath            3d16h

$ kubectl get pv pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c -o yaml | grep path
    openebs.io/cas-type: local-hostpath
    path: /var/openebs/local/pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c
  storageClassName: openebs-hostpath
  
$ cd /var/openebs/local/pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c

$ vi postgresql.conf
...
huge_pages = off                        # on, off, or try

$ 重启 Harbor 的 3 个 Pod
```

### Build CP Docker Images

- AMF

```bash
$ cat amf/Dockerfile

FROM ubuntu:18.04
LABEL version=1.0.2

RUN apt-get update --fix-missing && \
	apt-get install -y iproute2 apt-utils net-tools iputils-ping tcpdump iptables gdb vim dmidecode dnsutils curl && \
	apt-get clean

COPY lib /usr/local/lib/amf/lib
COPY log /var/log/amf
COPY amf /usr/local/bin/amf
COPY etc/amf /etc/amf

WORKDIR /etc/amf/

$ sudo docker build -t 192.168.205.17/5g/amf:1.0.2 amf/
```

### Build UPF Docker Image

- UPF

```bash
# 1. Into vpp source code, create container.
$ git clone git@gitlab.sh.99cloud.net:5GS/upf-astri.git
$ cd upf-astri
$ git checkout XXX
$ git tag -a v1.0 -m "test" # if no tags
$ ./buildpack container
a215761ad1f101536885d5554640f7428f2a8487b41222398e7d5010eb04cc07
root@a215761ad1f1:/opt#
$ docker ps
CONTAINER ID        IMAGE                     COMMAND             CREATED             STATUS              PORTS               NAMES
a478be4fec3a        fastgate/ubuntu1804:upf   "/bin/bash"         29 minutes ago      Up 29 minutes                           root_upf

# 2. Compile deb package in container.
$ mkdir -p build/external/downloads/
$ cp /downloads/*tar* build/external/downloads/
$ make pkg-deb
$ mkdir dep-deb
$ cp /dep-deb/*.deb dep-deb/
$ mv build-root/*.deb .
$ mv build/external/*.deb .
$ tar -czf upf_deb.tgz Makefile *.deb dep-deb/

# 3. Copy deb package of UPF (eg. upf_deb.tgz) into vpp repo and unzip the file.
$ cd vpp
$ docker cp root_upf:/opt/upf_deb.tgz .
$ tar -zxvf upf_deb.tgz

# 4. Build VPP image via Dockerfile into vpp/ folder.
$ cd vpp
$ sudo docker build -t 192.168.205.17/5g/upf:1.0.0 -f Dockerfile .
$ docker images
REPOSITORY              TAG                 IMAGE ID            CREATED             SIZE
192.168.205.17/5g/upf   1.0.0               f29a82c4ed86        11 seconds ago      936MB
```

- VPP Agent

```bash
# 1. Into vpp-agent repo and run docker/astri/dev/start.sh to enter upf-agent-dev container.
git clone git@gitlab.sh.99cloud.net:5GS/vpp-agent.git
cd vpp-agent 
git checkout main

./docker/astri/dev/start.sh

# 2. To compile deb package in container. make cmd
make deb

# 3. Exit the container, copy the deb package into docker/astri/prod/ and build upf-agent image.
sudo cp docker/astri/*.deb docker/astri/prod/
cd docker/astri/prod
sudo docker build -t 192.168.205.17/5g/upf-agent22:1.0.0 -f Dockerfile .
```

## Troubleshooting

### Useful CLI in Troubleshooting

- For each individual pod not up and running, try:

```bash
kubectl describe pod <pod_name>
```

- For pod is up and running, however the there are some error internally, try to read the logs of pod:

```bash
kubectl logs -f <pod_name>
```

- To restart the pod, simply run:

```bash
kubectl delete pod <pod_name>
```

- To open an port on host for troubleshooting on service:

For example, if you want to access some service which is only internally accessible in the

kubernetes cluster, try:

```bash
kubectl edit svc <svc_name>
```

Then change the type of service from ClusterIP to NodePort, which kube-proxy will then create a host port direct to the service port on each host in the kubernetes.

You could use kubectl get svc to view the host port opened for that service.

### 问题 1：Ansible 执行失败

```bash
Could not detect which package manager to use. Try gathering facts or settin
```

解决：Ansible 统一使用 Python3。

```bash
$ python --version
Python 3.6.9
```

### 问题 2：Harbor Pod 启动失败

- 现象

```bash
# kubectl get pods --all-namespaces | grep har
default              harbor-harbor-core-69fd799fd4-rlrd5                      0/1     CrashLoopBackOff    886        2d4h
default              harbor-harbor-database-0                                 0/1     CrashLoopBackOff    54         4h17m
default              harbor-harbor-jobservice-788df69665-4msxt                0/1     CrashLoopBackOff    835        2d4h
default              harbor-harbor-portal-78b68cfdb-sjb2b                     1/1     Running             0          3d13h
default              harbor-harbor-redis-0                                    1/1     Running             0          3d13h
default              harbor-harbor-registry-5554964584-q5kw4                  2/2     Running             0          3d13h
```

- 原因：Harbor Database 使用了 PostgreSQL，默认是不支持 HugePage 的。
- 解决：关闭 HugePage，然后重启 Pods。