MP System 部署
节点角色类型:
Kubespray Deployment Node
Kubernetes master node
Kubernetes worker node(5G CP)
Kubernetes worker node(5G UP)
单节点部署拓扑
资源参数
5GC AIO VM:
Core:8U,16U(推荐)
Memory:16G,32G(推荐)
Disk:200G
GuestOS:Ubuntu 18.04
Kernel:4.15.0-64-generic
Python:3.6
User:root
Kubespray 定制版
基础设施软件
Deployment Tools:Kubespray 定制版
Ansible
Kubeadm
Helm/Charts
CaaS:Kubernetes 1.17
Runtime:Docker CE,Docker Storage Option:Overlay2
Image Registry:Harbor
CNI:
Flannel: Default CNI
Multus: Support second CNI for 5G NFs
SR-IOV: It is a Kubernetes device plug-in for discovering, advertising, and allocating SR-IOV network virtual function resources.
host-device: Use the physical interface directly.
macvlan: Configure multiple virtual network interfaces on a network interface on the host.
HTTP/HTTPS Access:Nginx Ingress
Storage Option:OpenEBS(https://openebs.io/)
日志审计:
Elasticsearch
Fluentd
Kibana
监控告警:
Prometheus
Node Exporter
Alert Manager
MP System 软件
MP System 组件:
Orchestrator Database
Orchestrator
MP
Docker Images:
5g-orch-db.tar:Orchestrator DB Image(PostgreSQL)。
5g-orch.tar:Orchestrator Image。
mp-0412.tar:MP Image。
licenseutil:获取部署节点序列号,用于生产 License。
Playbook:
cluster.yml: Install and setup Kubernetes cluster including MP system.
scale.yml: Add more nodes to Kubernetes cluster.
remove-node.yml: Remove nodes from Kubernetes cluster.
reset.yml: Delete the Kubernetes clusters from all nodes.
Inventory:
5g_support: MP system installation hosts information and configuration.
Role:
5g_support: MP system installation is a standalone role putting into Kubespray.
CLIs:
Run kubernetes cluster playbook
ansible-playbook -i inventory/5g_support/hosts.ini cluster.yml -b -v --private-key=~/.ssh/id_rsa
To scale out: Add one more node in the inventory/5g_support/hosts.ini
ansible-playbook -i inventory/5g_support/hosts.ini scale.yml -b -v --private-key=~/.ssh/id_rsa
To scale in: Remove one more node in the inventory/5g_support/hosts.ini
ansible-playbook -i inventory/5g_support/hosts.ini remove-node.yml -b -v --extra-vars "node=node4”
To reset cluster (DELETE EVERYTHING):
# If you need to reset the whole kubernetes clusters, cd into the kubespray directory previously cloned and reset it.
ansible-playbook -i inventory/5g_support/hosts.ini reset.yml -b -v --private-key=~/.ssh/id_rsa
部署执行
Pre-Deployment
Kubespray Deployment Node
科学上网。
Generate and Copy the Key Pair to 5G Nodes
$ ssh-keygen
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@<5gcp-node_ip_address>
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@<5gup-node_ip_address>
使用国内 APT 源:
$ cp /etc/apt/sources.list /etc/apt/sources.list.backup
$ cat > /etc/apt/sources.list <<EOF
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
EOF
$ apt-get update -y && apt-get upgrade -y
使用国内 PIP 源:
$ mkdir ~/.pip
$ cat > ~/.pip/pip.conf << EOF
[global]
trusted-host=mirrors.aliyun.com
index-url=https://mirrors.aliyun.com/pypi/simple/
EOF
使用国内 Docker 镜像源:
# 安装 Docker CE
apt-get remove docker docker-engine docker-ce docker.io
apt-get install apt-transport-https ca-certificates curl software-properties-common -y
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
apt-get update -y
apt-get install docker-ce -y
systemctl enable docker && systemctl start docker && systemctl status docker
$ vi /etc/docker/daemon.json
{
"registry-mirrors": [
"https://hub-mirror.c.163.com",
"https://mirror.baidubce.com"
]
}
systemctl daemon-reload && systemctl restart docker
安装 Kubespray
# 获取 Kubespray 定制版
$ cd kubespray
# 单节点
$ git checkout 5g_support_singlenode/2.13
# 多节点
$ git checkout 5g_support/2.13
# 安装
$ pip3 install -r requirements.txt
上传 MP Images
$ docker load --input 5g-orch-db.tar
$ docker load --input 5g-orch.tar
$ docker load --input mp-0412.tar
5G CP Node
NOTE:不要开启 HugePage,否则 Harbor Pods 会启动失败。
5G UP Node
NOTE:Kernel Version == 4.15.0。
Add vfio Module
# Add “vfio-pci” in /etc/modules.
$ cat /etc/modules
vfio-pci
# Add “iommu=pt intel_iommu=on” in /etc/default/grub
$ cat /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash iommu=pt intel_iommu=on"
GRUB_CMDLINE_LINUX=""
cgroup_enable=memory swapaccount=1 intel_iommu=on iommu=pt
# Update grub and Reboot Server.
$ sudo update-grub
$ reboot
$ lsmod | grep vfio
vfio_pci 45056 0
vfio_virqfd 16384 1 vfio_pci
vfio_iommu_type1 24576 0
vfio 28672 3 vfio_iommu_type1,vfio_pci
irqbypass 16384 1 vfio_pcis
Install Kubernetes Clusters and MP System
Edit downalod.yml:使用 aliyun 镜像源。
$ vi kubespray/inventory/5g_support/group_vars/k8s-cluster/downalod.yml
...
# If there is internet connection in mainland china, set aliyun_enable to true
aliyun_enable: true
(for all-in-one installation)Edit Kubespray Inventory Configuration:
$ cat kubespray/inventory/5g_support/hosts.ini
# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
# ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value.
[all]
kube-cluster-1 ansible_host=172.18.22.220 etcd_member_name=etcd1
[kube-master]
kube-cluster-1
[kube-node]
kube-cluster-1
[k8s-cluster:children]
kube-master
kube-node
[etcd]
kube-cluster-1
(多节点部署)Edit Kubespray Inventory Configuration:
# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
# ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value.
[all]
node1 ansible_host=172.18.17.60 etcd_member_name=etcd1
node2 ansible_host=172.18.17.61 etcd_member_name=etcd2
node3 ansible_host=172.18.17.62 etcd_member_name=etcd3
[kube-master]
node1
node2
[etcd]
node1
node2
node3
[kube-node]
node1
node2
node3
[k8s-cluster:children]
kube-master
kube-node
(For all-in-one installation)Edit
kubespray/roles/5g_support/tasks/main.yml
:Change allstorageClass=openebs-sc
intoopenebs-hostpath
. EXAMPLE:
- name: SMARTCITY | Install Harbor Package
command: "{{ bin_dir }}/helm install sco-harbor {{ helm_home_dir }}/packages/ha rbor-1.1.1.tgz --set persistence.persistentVolumeClaim.registry.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.jobservice.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.database.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.redis.storageClass=openebs-hostpath"
Install Kubespray Cluster and MP system
$ cd kubespray
$ sudo ansible-playbook -i inventory/5g_support/hosts.ini cluster.yml -b -v --private-key=~/.ssh/id_rsa
If you have an insecure private registry for docker image, change docker registry option and change corresponding address in the following command:
ansible-playbook -i inventory/5g_support/hosts.ini --become --become-user=root cluster.yml -e '{"docker_insecure_registries":["172.18.22.220"]}' -vvv
Wait 10-20 minutes for ansible to finish installation. If some errors have occurred during the installation, you could reset the cluster, solve the issue and re-install again.
Pod 清单
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default elasticsearch-master-0 0/1 Running 2 4d14h
default es-curator-elasticsearch-curator-1623632400-hxljs 0/1 Completed 0 2d5h
default es-curator-elasticsearch-curator-1623718800-wmf2w 0/1 Completed 0 29h
default es-curator-elasticsearch-curator-1623805200-94647 0/1 Completed 0 3h37m
default fluentd-46vts 1/1 Running 2 27h
default harbor-harbor-core-69fd799fd4-5b7z8 1/1 Running 7 23h
default harbor-harbor-database-0 1/1 Running 2 23h
default harbor-harbor-jobservice-788df69665-cjdp6 1/1 Running 6 23h
default harbor-harbor-portal-78b68cfdb-sjb2b 1/1 Running 2 4d13h
default harbor-harbor-redis-0 1/1 Running 2 4d13h
default harbor-harbor-registry-5554964584-q5kw4 2/2 Running 4 4d13h
default ingress-nginx-ingress-controller-6c748ccfb9-xdp2c 1/1 Running 2 4d14h
default ingress-nginx-ingress-default-backend-76fb4dfd79-mtgd2 1/1 Running 2 28h
default mp-9fcb8c7c4-7klsz 1/1 Running 2 22h
default orch-7bcb9b8498-84whh 1/1 Running 5 22h
default orch-db-7954664f45-j86cx 1/1 Running 2 3d1h
default prom-adapter-prometheus-adapter-854457d445-9m8g7 1/1 Running 2 4d14h
default prom-prometheus-kube-state-metrics-586fdb6d-dhpkl 1/1 Running 2 4d14h
default prom-prometheus-node-exporter-lcsnp 1/1 Running 2 4d14h
default prom-prometheus-server-6c88db7cd7-6r7br 2/2 Running 4 4d14h
kube-system coredns-76798d84dd-xt4ll 1/1 Running 2 20h
kube-system coredns-76798d84dd-zfwrj 0/1 Pending 0 20h
kube-system dns-autoscaler-56549847b5-s8mmk 1/1 Running 2 4d15h
kube-system kube-apiserver-kube-cluster-1 1/1 Running 2 4d15h
kube-system kube-controller-manager-kube-cluster-1 1/1 Running 5 4d15h
kube-system kube-flannel-kxktp 1/1 Running 4 4d15h
kube-system kube-multus-ds-amd64-rfdm8 1/1 Running 2 4d15h
kube-system kube-proxy-f6ncr 1/1 Running 2 20h
kube-system kube-scheduler-kube-cluster-1 1/1 Running 6 4d15h
kube-system kube-sriov-device-plugin-amd64-9qmpn 1/1 Running 2 4d15h
kube-system kubernetes-dashboard-77475cf576-mhwv5 1/1 Running 2 4d15h
kube-system kubernetes-metrics-scraper-747b4fd5cd-4jtb5 1/1 Running 2 4d15h
kube-system nodelocaldns-rqck7 1/1 Running 2 4d15h
local-path-storage local-path-provisioner-7dfbb94d64-xdwbl 1/1 Running 2 4d15h
openebs cstor-disk-pool-0sls-68759bd4b4-dq5fq 3/3 Running 6 4d14h
openebs openebs-admission-server-db47b787f-kt75s 1/1 Running 2 4d15h
openebs openebs-apiserver-556ffff45c-6zk4s 1/1 Running 7 4d15h
openebs openebs-localpv-provisioner-c6bc845bb-c9svf 1/1 Running 3 4d15h
openebs openebs-ndm-lgckq 1/1 Running 4 4d15h
openebs openebs-ndm-operator-5f6c5497d7-zm85z 1/1 Running 3 4d15h
openebs openebs-provisioner-598c45dd4-2kgmv 1/1 Running 2 4d15h
openebs openebs-snapshot-operator-5f74599c8c-z4ffc 2/2 Running 4 4d15h
Post-Deployment
Create mp License
跑完 Playbook 之后会看见 orch、orch-db、mp Pods 仍然没有 running,因为需要注入 license.txt。
NOTE: The mp and orchestrator should run on the same node.
现象:
$ kubectl describe pod mp-9fcb8c7c4-mhl2q
...
MountVolume.SetUp failed for volume "sco-mp-license" : configmap "sco-mp-license" not found
解决:生产并导入许可证,然后重启 orch、orch-db、mp Pods。
Send the string to Astri and Astri generate license.
$ sudo ./licenseutil
SGFyZHdhcmUgU2VyaWFsIG51bWJlcjpOb3QgU3BlY2lmaWVkLEN1cnJlbnRUaW1lOjIwMjEtMDYtMTUgMTQ6NTE6MzU=
Save the license into license.txt and create configmap on master node.
$ cat license.txt
Company Name: 99cloud
Expiry Date: 2021-08-15 12:59:59.999
Licensed Servers: All
License Key: d5fM8RG/LzwtbKpg94i/MAFs5we36IDq+GZ3rfIUdCx3CNUlEpy9QJ5Ov+NwcNJ89dQJAhzfSAKI0PfnVTbrF68g1MJO+e8sTriROt+3n1v9QdpSiRis0PtSr7UR9lf6ao+xr5LbttlKjXs0fSFq5NBw3WsywiBNtNvG+9DiRgs=
$ kubectl create configmap sco-mp-license --from-file=license.txt
Expose Service to Outside Network
To expose orchestrator service, please run following command line on 5gcp-node. Please change <node_ip_address>
to 5gcp-node IP address.
# ClusterIP
$ kubectl patch svc orch -p '{"spec": {"externalIPs": ["<node_ip_address>"]}}'
# NodePort
$ kubectl patch svc orch -p '{"spec": {"externalIPs": ["172.18.22.220"], "externalTrafficPolicy": "Cluster", "type": "NodePort", "ports": [{"name": "websocket", "port": 80, "protocol": "TCP", "targetPort": 80, "nodePort": 32614}]}}'
Login Web UI
Web UI:http://172.18.22.220:32614/sco/web/login
root/root
附录
Kubectl Configuration
On the target-node:
sudo mkdir -p $HOME/.kube
sudo cp -f /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Setup SRIOV for UPF Node
Setup VFs
# find which pci device support SRIOV
lspci -nnn | grep Ether
# Max Allowed VFs
cat /sys/class/net/<ifname>/device/sriov_totalvfs
# Set VF Number
echo 8 > /sys/class/net/<ifname>/device/sriov_numvfs
# Check VF
ip link show
# Find pci-addr of Interface
ethtool -i <ifname>
Edit Config Map
$ kubectl -n kube-system edit cm sriovdp-config
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures. #
apiVersion: v1
data: config.json: |
{
"resourceList":
[
{
"resourceName": "sriovnetwork",
"selectors": {
"vendors": ["8086"],
"devices": ["1521"],
"drivers": ["ixgbevf"]
}
}
]
}
Restart SRIOV Pods
kubectl -n kube-system get pod
# Check in Pod
printenv
PCIDEVICE_INTEL_COM_SRIOVNETWORK=0000:01:10.0
Setup Hugepage for UPF Node
$ cat /etc/sysctl.conf
...
vm.nr_hugepages = 512
$ sysctl -p
$ sysctl -w vm.nr_hugepages=512
$ cat /proc/meminfo | grep Huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 512
HugePages_Free: 512
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 1048576 kB
Harbor 支持大页内存
$ kubectl get pv | grep harbor-database
pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c 1Gi RWO Delete Bound default/database-data-harbor-harbor-database-0 openebs-hostpath 3d16h
$ kubectl get pv pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c -o yaml | grep path
openebs.io/cas-type: local-hostpath
path: /var/openebs/local/pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c
storageClassName: openebs-hostpath
$ cd /var/openebs/local/pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c
$ vi postgresql.conf
...
huge_pages = off # on, off, or try
$ 重启 Harbor 的 3 个 Pod
Build CP Docker Images
AMF
$ cat amf/Dockerfile
FROM ubuntu:18.04
LABEL version=1.0.2
RUN apt-get update --fix-missing && \
apt-get install -y iproute2 apt-utils net-tools iputils-ping tcpdump iptables gdb vim dmidecode dnsutils curl && \
apt-get clean
COPY lib /usr/local/lib/amf/lib
COPY log /var/log/amf
COPY amf /usr/local/bin/amf
COPY etc/amf /etc/amf
WORKDIR /etc/amf/
$ sudo docker build -t 192.168.205.17/5g/amf:1.0.2 amf/
Build UPF Docker Image
UPF
# 1. Into vpp source code, create container.
$ git clone git@gitlab.sh.99cloud.net:5GS/upf-astri.git
$ cd upf-astri
$ git checkout XXX
$ git tag -a v1.0 -m "test" # if no tags
$ ./buildpack container
a215761ad1f101536885d5554640f7428f2a8487b41222398e7d5010eb04cc07
root@a215761ad1f1:/opt#
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a478be4fec3a fastgate/ubuntu1804:upf "/bin/bash" 29 minutes ago Up 29 minutes root_upf
# 2. Compile deb package in container.
$ mkdir -p build/external/downloads/
$ cp /downloads/*tar* build/external/downloads/
$ make pkg-deb
$ mkdir dep-deb
$ cp /dep-deb/*.deb dep-deb/
$ mv build-root/*.deb .
$ mv build/external/*.deb .
$ tar -czf upf_deb.tgz Makefile *.deb dep-deb/
# 3. Copy deb package of UPF (eg. upf_deb.tgz) into vpp repo and unzip the file.
$ cd vpp
$ docker cp root_upf:/opt/upf_deb.tgz .
$ tar -zxvf upf_deb.tgz
# 4. Build VPP image via Dockerfile into vpp/ folder.
$ cd vpp
$ sudo docker build -t 192.168.205.17/5g/upf:1.0.0 -f Dockerfile .
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
192.168.205.17/5g/upf 1.0.0 f29a82c4ed86 11 seconds ago 936MB
VPP Agent
# 1. Into vpp-agent repo and run docker/astri/dev/start.sh to enter upf-agent-dev container.
git clone git@gitlab.sh.99cloud.net:5GS/vpp-agent.git
cd vpp-agent
git checkout main
./docker/astri/dev/start.sh
# 2. To compile deb package in container. make cmd
make deb
# 3. Exit the container, copy the deb package into docker/astri/prod/ and build upf-agent image.
sudo cp docker/astri/*.deb docker/astri/prod/
cd docker/astri/prod
sudo docker build -t 192.168.205.17/5g/upf-agent22:1.0.0 -f Dockerfile .
Troubleshooting
Useful CLI in Troubleshooting
For each individual pod not up and running, try:
kubectl describe pod <pod_name>
For pod is up and running, however the there are some error internally, try to read the logs of pod:
kubectl logs -f <pod_name>
To restart the pod, simply run:
kubectl delete pod <pod_name>
To open an port on host for troubleshooting on service:
For example, if you want to access some service which is only internally accessible in the
kubernetes cluster, try:
kubectl edit svc <svc_name>
Then change the type of service from ClusterIP to NodePort, which kube-proxy will then create a host port direct to the service port on each host in the kubernetes.
You could use kubectl get svc to view the host port opened for that service.
问题 1:Ansible 执行失败
Could not detect which package manager to use. Try gathering facts or settin
解决:Ansible 统一使用 Python3。
$ python --version
Python 3.6.9
问题 2:Harbor Pod 启动失败
现象
# kubectl get pods --all-namespaces | grep har
default harbor-harbor-core-69fd799fd4-rlrd5 0/1 CrashLoopBackOff 886 2d4h
default harbor-harbor-database-0 0/1 CrashLoopBackOff 54 4h17m
default harbor-harbor-jobservice-788df69665-4msxt 0/1 CrashLoopBackOff 835 2d4h
default harbor-harbor-portal-78b68cfdb-sjb2b 1/1 Running 0 3d13h
default harbor-harbor-redis-0 1/1 Running 0 3d13h
default harbor-harbor-registry-5554964584-q5kw4 2/2 Running 0 3d13h
原因:Harbor Database 使用了 PostgreSQL,默认是不支持 HugePage 的。
解决:关闭 HugePage,然后重启 Pods。