Maintenance / Extensions done on SWITCHengines
- 2019-08-27: SWITCHengines ZH upgrade to Ocata
- 2018-08-13: SWITCHengines LS upgrade to Ocata
WHAT / WHEN
We are upgrading our ZH region to a new release of OpenStack (the Ocata release for those interested) on the 2019-08-27. We will shut down the control plane for the duration of the upgrade, which means that you will not be able to administer running virtual machines (or create new ones). The VMs themselves are not impacted and will continue to run as normal.
In addition we will perform preliminary operations on the network nodes on 2019-08-20 in order to finish the actual upgrade a week later faster.
SWITCHengines is built with OpenStack, an open source cloud orchestration framework. OpenStack has a 6 month release cycle and is currently on the “Stein” (April 2019) release. We have been running on “Newton” for almost 2 years and are now updating to “Ocata” (February 2017). There are very few end user visible changes, so you will probably not notice and differences. We gain a number of bug and security fixes and can start work on the next upgrade (to the “Pike” version)
In addition to this upgrade, we also have completely replaced the underlying architecture of the so-called “Control Plane” - the services you interact with when you do cloud operations. Until now, each of the various services that make up OpenStack (they take care of virtualisation, image handling, disk management, network virtualisation, identity and others) have been running in a virtual machine on dedicated hardware in each of our two regions. As of now (LS) and in two weeks (ZH), these services run in containers on a bare-metal Kubernetes cluster in each of the regions. This approach gives us a number of advantages, compared to the old way:
- All services are redundant (before only one instance of each service was running) and Kubernetes takes care of keeping those services up and running at all times.
- The isolation that containers provide make it a lot easier for us to upgrade individual services (and these upgrades can be done without downtime)
- It is very easy for us to scale the services to handle higher loads should the need arise
We have been running these Kubernetes clusters for a long time now, and two of the services (the identity service and the Web UI) have been running on it for the same amount of time. In addition, our development cluster has been running on a Kubernetes infrastructure for over a year, and the staging cluster has been running for several months. We are confident that the setup is ready for production.
What: We need to do some urgent network maintenance on our cluster in Zurich on Tuesday, August 14th from 10:00 onwards.Impact: Network redundancy will be reduced during maintenance. No outage is expected, but there is an elevated risk due to reduced redundancy. We will be closely monitoring the network and the cluster during the work and will respond immediately should a problem arise.Background: We need to swap out two switches as we gear up to expand our cluster.
Update: we had a ~25 minute Openstack Keystone endpoint outage because of a firewall misconfiguration. Some users were not able to login to the Dashboard.
- 2018-03-07: Upgrade of Quagga Software on storage servers. Performance impacts expected
- 2018-02-20: Upgrade of Storage Software in ZH Region from Ceph "Jewel" to Ceph "Luminous". No downtime planned
- 2017-12-12: Upgrade of Storage Software in LS Region from Ceph "Jewel" to Ceph "Luminous". No downtime planned
- 2017-03-04: Upgrade of Storage Software in LS Region from Ceph "Hammer" to Ceph "Jewel". Planned downtime of RadosGW (which serves the Object Storage via the S3 API)
- April 2017: Upgrade of LS region to OpenStack Newton
- May 2017: Upgrade of storage software in ZH to Ceph Jewel