Adding New OpenStack Roles to an Existing High Availability Cluster

January 19, 2017 Ricky Adams

Example: Adding New OpenStack or Contrail Roles to an Existing High Availability Cluster

This section provides an example of adding new nodes or roles to an existing cluster with high availability enabled. It is organized in the following:

Adding New OpenStack or Contrail Roles to an Existing High Availability Cluster

To add new nodes or roles to an existing cluster with high availability enabled:

Install the new server with the new base OS image (currently Ubuntu 14.04 is supported)
Download the latest Contrail installer package to the new server.
Run the dpkg –I <contrail_install_deb.deb> command on the new server.
Run the /opt/contrail/contrail_packages/setup.sh script in the new server to install the Contrail repository.
Modify the testbed.py file in the build server as follows:
1. Add the new host in the list of hosts as host<x>
2. In the env.roledefs list, add the new node as appropriate. For example, add the node in the openstack role list, so that the new node is configured as an OpenStack server. Every role can be added mutually exclusive of each other.
3. Add the hostname of the new server in the env.hostnames field.
4. Add the password for the new server in the env.passwords field.
5. Add the control data interface of the new server if applicable.
In the build server, go to the /opt/contrail/utils directory and use the fab install_new_contrail:new_ctrl=’root@<host_ip>’ command to install all the required packages in the new server based on the roles configured in the testbed.py file.
If you have two interfaces, the control and data interface and the management interface, then the control and data interface can be setup using the fab setup_interface_node’root@<host_ip>’ command.
In the build server, go to the /opt/contrail/utils directory and use the fab join_cluster:’root@<host_ip>’command. This adds the new server to the corresponding cluster based on the role that is configured in the testbed.py file.
The new node is added to the existing cluster. See Understanding How a New Node is Added to an Existing Cluster.

Purging a Controller From an Existing Cluster

To purge a controller node from an existing cluster with high availability enabled:

Open the testbed.py file which contains the topology information of the cluster.

In the env.roledefs list, remove the role as appropriate. For example, if you need the node to cease existing as an OpenStack controller, remove the node from the openstack role list.

Note: Every role can be added mutually exclusive to each other. However, there are certain minimum node requirements for OpenStack and Database roles (at least 3 nodes) and if the size of the rest of the cluster does not meet these minimum requirements after purging the node, then deleting the node is not allowed.

Note: The node should not be deleted from the host list (or the passwords) but only from the env.roledefs list. The node can be removed from the host list after the purge operation is completed.

In the build server, go to the /opt/contrail/utils directory and use the fab purge_node:'root@<ip_address>' command. This removes all the configuration and stops the relevant services from the node that you need to purge.

Remove the rest of the configuration related to the node from the testbed.py file after the previous command completes.

Caution: In the event that the node that needs to be deleted is already down (non-operational), it should not be brought up again since it would join the cluster again with unknown consequences.

Replacing a Node With a Node That Has the Same IP Address

To replace a node in an existing cluster with high availability enabled with a node that has the same IP address:

Make sure that the cluster continues to operate when the node being replaced is taken off the cluster. (For example, it meets the minimum failure requirements)
Reimage the node that is being replaced and make sure it gets the same IP address.
Follow the steps to add a node to an existing cluster.

Known Limitations and Configuration Guidelines

The following lists some known limitations and some configuration guidelines for when you add new nodes or roles to an existing cluster with high availability enabled:

Adding a new node to an existing high availability cluster is only supported in Contrail Release 2.21 and later.
Converting a single node or cluster that does not have high availability enabled to a cluster that does have high availability enabled is not supported.
New nodes must be appended to the existing list of nodes at the end of the testbed.py lists.
We recommend you maintain a cluster with an odd number of controllers since high availability is based on a quorum and to support n failures, (2n + 1) nodes are required.
The new node is required to be running the same release as of the other nodes in the cluster.
You need to use the nodetool cleanup command after a new node joins the Cassandra cluster. You can safely schedule this for low usage hours to prevent disruption of the cluster operation.
While deleting a node from an existing cluster, the remaining cluster should be operational and meet the high availability cluster requirements, if not, purging the node is not allowed.

Understanding How the System Adds a New Node to an Existing Cluster

The following lists the actions the system takes when adding a new node to an existing cluster with high availability enabled:

If a new OpenStack server is configured, the system:
- Adds the new node as a participant in the VRRP mastership election.
  - Generates a keepalived configuration similar to other nodes in the existing cluster.
  - Restarts the keepalived process in all nodes so the configuration takes effect.
- Modifies the haproxy configuration in all the existing nodes to add the new node as another backend for roles like keystone, nova, glance, and cinder.
- Restarts the haproxy process in all the nodes so that the new configuration takes effect.
- Modifies the mysql configuration in the /etc/mysql/conf.d/wsrep.conf file to add the new node into the Galera cluster in all the existing controllers.
  - Restarts MySQL in every node sequentially and waits until they come back up.
  - Generates a MySQL configuration similar to other existing controllers and waits until the new OpenStack server syncs all the data from the existing MySQL donors.
- Installs CMON and generates the CMON configuration.
- Adds a CMON user in all the MySQL databases so that CMON is able to monitor the MySQL server and regenerates the CMON configuration in all the nodes so that the CMON in the other nodes can also monitor the new MySQL database.
- Generates cmon_param files based on the new configuration and invokes monitor scripts.
- Instructs the new node to join the RabbitMQ cluster. (If RabbitMQ is not externally managed.)
- Modifies the rabbitmq.config file in all the existing nodes to include the new node and restarts the RabbitMQ server sequentially so that the new node joins the cluster.
If a new database node is configured, the system:
- Generates the database configuration (cassandra.yaml file) for the new node and uses the nodetool command to make the new node join the existing cluster. (Uses the fab setup_database_node command)
- Generates a new zookeeper configuration in the existing node, adds the new node into the existing zookeeper configuration, and restarts the zookeeper nodes sequentially.
- Starts the Cassandra database in the new node.
If a new Configuration Manager node is configured, the system:
- Generates all the configuration required for a new Configuration Manager (cfgm) node and start the server (Uses the fab setup_config_node command)
- Modifies the HAProxy configuration to add the new node into the existing list of backends.
- If required, modifies the zookeeper and Cassandra server list in the /etc/contrail/contrail-api.conf file in the existing config nodes.
- Restarts the config nodes so that the new configuration takes effect.
• If a new controller node is configured, the system:
- Generates all the configuration required for a new control node and starts the control node. (Uses the fab setup_control_node command)
- Instructs the control node to peer with the existing control nodes.
If a new collector, analytics, or WebUI node is configured, the system:
- Generates all the configuration required for a new collector node and starts the control node (Uses the fab setup_collector_node and fab setup_webui_node commands)
- Updates the existing configuration in all the new nodes to add a new database node if required.
- Starts the collector process in the new node.
- Starts the webui process in the new node.

Modified: 2015-10-02

SOURCE: https://www.juniper.net/techpubs/en_US/contrail2.21/topics/task/configuration/high-availability-node-adding.html