Create a Kapacitor Enterprise Cluster

This is archived documentation for InfluxData product versions that are no longer maintained. For newer documentation, see the latest InfluxData documentation.

Kapacitor clusters provide a high-availability solution for capturing, manipulating, and acting on time-series data. Kapacitor Enterprise provides secure communication between InfluxDB and your Kapacitor servers as well as deduplication of alerts generated by the cluster.

This article covers the following:

Terminology

The following terms are used frequently throughout this guide.

  • Member - An instance of the Kapacitor Enterprise process typically running in a host or container.
  • Cluster - A set of members aware of each other.

Kapacitor cluster architecture

Kapacitor Enterprise clusters have only a single type of member, meaning every member of a cluster performs the same function. Essentially each member runs as a standalone Kapacitor instance, but is aware of and shares information with other members of the cluster.

When planning your cluster architecture, there are a few general rules to follow:

1. Directly accessible members

Members of the cluster must be directly accessible to other members of the cluster via TCP or UDP. Members must also be accessible via HTTP, HTTPS, or UDP from the InfluxDB instance or cluster from which data is received.

2. No load balancers

Members of a Kapacitor cluster should not be placed behind a load-balancer. InfluxDB needs direct access to all members of the cluster in order to copy written data to each. The Kapacitor subscriptions documentation outlines how InfluxDB sends data to Kapacitor via subscriptions.

3. Know the size of your cluster before starting

In the current release of Kapacitor, adding and removing members from a cluster dynamically can cause the cluster to get out of sync. To prevent synchronization issues, decide in advance how many members you want to run. You can add or remove members once a cluster is running, but this must be done correctly. See Removing a member from the cluster below for details.

Setup a Kapacitor Enterprise cluster

The basic installation steps for Kapacitor Enterprise cluster are:

  1. Configure Kapacitor Enterprise
  2. Start each member of the cluster
  3. Add members to a cluster
  4. Start using the Kapacitor cluster

Step 1: Configure Kapacitor Enterprise

Hostname configuration

In order for Kapacitor Enterprise members to communicate with each other, they need to be able resolve each other’s addresses. The hostname setting for each Kapacitor Enterprise member is the DNS or IP of the member. All other Kapacitor Enterprise members need to be able to resolve and access that address.

If your network has members with different addresses for public and private networks, there are configuration settings for the advertise-address of each of the respective services.

Cluster configuration

Kapacitor Enterprise uses a gossip protocol to maintain cluster membership and facilitate communication within the cluster. In Kapacitor Enterprise’s kapacitor.conf, the additional [cluster] section includes options specific to clusters. These options define the network settings and tunable parameters for the gossip protocol.

In most cases the defaults are sufficient.

[cluster]
  bind-address = ":9090"
  advertise-address = ""
  roles = ["worker"]
  gossip-members = 3
  gossip-interval = "0s"
  gossip-sync-interval = "0s"

bind-address

The bind-address is a host:port pair to which the cluster gossip communication can bind. If only the port is specified, the host is inherited from the hostname configuration. The address is bound using both UDP and TCP protocols.

The advertise-address is the address advertised to other members of the cluster for this member. Defaults to provided bind-address if empty.

gossip-members

The gossip-members setting is the number of neighboring members to whom gossip messages are sent. In the configuration file, the default setting of gossip-members = 0 results in a default value designed for use within a typical LAN network (currently 3).

A higher count results in faster convergence but also increases network bandwidth.

gossip-interval

The gossip-interval is the time between gossip messages. In the configuration file, the default setting of gossip=interval = 0s results in a default value designed for use within a typical LAN network (currently 200ms).

A shorter interval means faster convergence but increased network bandwidth.

gossip-sync-interval

The gossip-sync-interval is the time between full TCP state sync of the cluster gossip state. In the configuration file, the default setting of gossip-sync-interval = "0s" results in a default value designed for use within a typical LAN network (currently 30s).

A shorter interval means faster convergence but more network bandwidth.

Alerting configuration

Kapacitor Enterprise deduplicate alerts generated from tasks running on multiple members. The [alert] configuration section includes the following options:

[alert]
  redundancy = 0
  delay-per-member = "10s"
  full-sync-interval = "5m0s"

redundancy

The redundancy setting is the number of redundant servers to be assigned ownership of each alert topic. The default value is 0. Set this to however many servers on which you plan to replicate tasks. For example, if you plan to run each task on 2 members, set redundancy = 2.

delay-per-member

The delay-per-member setting indicates the duration, or amount of time, each member should be given to process an event. If the specified duration elapses without receiving notification of event completion, the next member in line assumes responsibility of the event. The default is 10s. Decreasing the value reduces the long-tail latency of alerts with the cost of a high probability of duplicated alerts.

full-sync-interval

The full-sync-interval is the duration, or time period, in which full state is synced. The duration value specifies an upper bound on the amount of drift that can occur. The default value is 5m0s.

Increasing redundancy means more work is duplicated within the cluster and decreases the likelihood of a failure causing an alert to be lost. An alert is only dropped if all redundant members handling the alert fail together. Increasing the delay-per-member can reduce the probability of duplicate alerts in the case of a partial failure, but it also increases the duration at which an alert could arrive late.

InfluxDB configuration

The [influxdb] section of the kapacitor.conf has a subscription-mode option which should be set to server when running Kapacitor Enterprise as a cluster. This allows each server within the cluster to create its own subscription to InfluxDB through which it receives copies of all written data.

Step 2: Start each member of the cluster

Once Kapacitor Enterprise is installed and the necessary configuration settings have been set, start Kapacitor on each member of your cluster.

The following example walks through setting up a Kapacitor cluster with two members: kapacitor-1 and kapacitor-2. This process can be easily extended to more than two members.

Start Kapacitor on kapacitor-1:

kapacitor-1$ kapacitord -config /path/to/kapacitor-1/kapacitor.conf

Use the kapacitorctl member list command to list the members of the cluster. The list will have only one entry for kapacitor-1 since no other members have been added to the cluster yet.

kapacitor-1$ kapacitorctl member list
State: uninitialized
Cluster ID: 876ddfb4-1879-4f40-87e2-4080c04d3096
Local Member ID: f74f3547-efaf-4e6e-8b05-fb12b19f8287
Member ID                               Gossip Address     RPC Address        API Address        Roles  Status
f74f3547-efaf-4e6e-8b05-fb12b19f8287    kapacitor-1:9090   kapacitor-1:9091   kapacitor-1:9092   worker alive

The output includes addresses necessary for adding this Kapacitor server to a cluster. Each address exposes a service. Below is a table outlining the purpose for each service.

ServicePublic/PrivateDefault PortNetwork ProtocolDescription
GossipPrivate9090TCP and UDPKapacitor uses a gossip protocol to maintain cluster membership and otherwise communicate.
RPCPrivate9091TCPKapacitor uses the RPC service for peer to peer communication between members.
APIPublic9092TCPKapacitor exposes an HTTP REST API, all external systems communicate with Kapacitor via this service.

Services marked “Private” do not need to be exposed to any other systems; only to other Kapacitor members. “Private” means private to the cluster.

Start the next member

Start Kapacitor on the second member, kapacitor-2.

kapacitor-2$ kapacitord -config /path/to/kapacitor-2/kapacitor.conf

Use the kapacitorctl member list command to view the information for this Kapacitor member.

kapacitor-2$ kapacitorctl member list
State: uninitialized
Cluster ID: 9acd33e6-ed88-4601-98df-6b73c1c78427
Local Member ID: 13eeefdd-41b5-453f-928e-cb9c55fd2a5d
Member ID                               Gossip Address     RPC Address        API Address        Roles  Status
13eeefdd-41b5-453f-928e-cb9c55fd2a5d    kapacitor-2:9090   kapacitor-2:9091   kapacitor-2:9092   worker alive

Step 3: Add members to a cluster

With both kapacitor-1 and kapacitor-2 running independently, add them together to form a single cluster. When adding members to a cluster, you must have the RPC address of the member being added. This is included in the kapacitorctl member list output.

On kapacitor-1, use the kapacitorctl member add command and kapacitor-2’s RPC address to add kapacitor-2 to the cluster.

kapacitor-1$ kapacitorctl member add kapacitor-2:9091

kapacitor-1 will initiate a connection to kapacitor-2 over the RPC service and join it to the cluster. Use the kapacitorctl member list command to check that both members know about each other.

Member list from kapacitor-1

kapacitor-1$ kapacitorctl member list
State: initialized
Cluster ID: 876ddfb4-1879-4f40-87e2-4080c04d3096
Local Member ID: f74f3547-efaf-4e6e-8b05-fb12b19f8287
Member ID                               Gossip Address     RPC Address        API Address        Roles  Status
f74f3547-efaf-4e6e-8b05-fb12b19f8287    kapacitor-1:9090   kapacitor-1:9091   kapacitor-1:9092   worker alive
13eeefdd-41b5-453f-928e-cb9c55fd2a5d    kapacitor-2:9090   kapacitor-2:9091   kapacitor-2:9092   worker alive

Member list from kapacitor-2

kapacitor-2$ kapacitorctl member list
State: initialized
Cluster ID: 876ddfb4-1879-4f40-87e2-4080c04d3096
Local Member ID: f74f3547-efaf-4e6e-8b05-fb12b19f8287
Member ID                               Gossip Address     RPC Address        API Address        Roles  Status
f74f3547-efaf-4e6e-8b05-fb12b19f8287    kapacitor-1:9090   kapacitor-1:9091   kapacitor-1:9092   worker alive
13eeefdd-41b5-453f-928e-cb9c55fd2a5d    kapacitor-2:9090   kapacitor-2:9091   kapacitor-2:9092   worker alive

Notice that the cluster state is initialized and cluster IDs are the same for both members. Repeat the process above to add additional members to the cluster.

Step 4: Start using the Kapacitor cluster

Kapacitor clustering is designed to duplicate task work while deduplicating alerts generated from tasks. If one member fails, other members in the cluster running the task will continue to generate alerts. To leverage Kapacitor Enterprise’s high-availability features, define tasks on multiple members that publish to a topic. Then define alert handlers that publish alerts from that topic. The Alerts in a Kapacitor clusters documentation provides more information.

Remove a member from a Kapacitor cluster

Members can be removed from a cluster as needed, however they should not just be shut down without manual intervention. Doing so will cause synchronization issues throughout the remaining members of the cluster. To remove a member, us the kapacitorctl member remove command.

Using the example cluster setup above, to remove kapacitor-2, get the Member ID of kapacitor-2 from the kapacitorctl member list output. Run the kapacitorctl member remove command on either kapacitor-1 or kapacitor-2:

kapacitor-1$ kapacitorctl member remove 13eeefdd-41b5-453f-928e-cb9c55fd2a5d

kapacitor-2 will enter an uninitialized state with a new cluster ID.

kapacitor-2$ kapacitorctl member list
State: uninitialized
Cluster ID: bcaf2098-f79a-4a62-96e4-e2cf83441561
Local Member ID: 13eeefdd-41b5-453f-928e-cb9c55fd2a5d
Member ID                               Gossip Address RPC Address    API Address    Roles  Status
13eeefdd-41b5-453f-928e-cb9c55fd2a5d    kapacitor-2:9090   kapacitor-2:9091   kapacitor-2:9092   worker alive

If decommissioning the Kapacitor server, be sure to remove its subscription from InfluxDB. InfluxDB does not know if the server will come back or not and will continue to attempt to send data to the removed member unless its subscription is manually removed.

Cluster data ingestion

The primary methods for writing data to Kapacitor are using subscriptions or directly using the Kapacitor API /kapacitor/v1/write endpoint.

Sequentially or in parallel

When writing directly to the Kapacitor API, send requests to all nodes in your Kapacitor Enterprise cluster. You can send data in parallel (to all nodes at once) or sequentially (to one node after the other). Either method is acceptable, but when writing sequentially, ensure each point has the same timestamp when writing to each node and that write requests arrive quickly. Each node will process data at slightly different times, but if points have the same timestamp, each Kapacitor node will compute the same result.

The delay-per-member configuration option is a time interval that lets nodes process data at slightly different times. As long as the write requests to the first node and the last node are no more than the delay-per-member interval, you will not have any issues.

The default value of delay-per-member is 10s. You can increase the value, but this will delay alert events by that same amount.

Cluster awareness and replication

The current release of Kapacitor Enterprise is only partially “cluster-aware,” meaning that some commands and information on the cluster will be automatically replicated throughout the cluster while other commands and information are not shared or need to be explicitly run on each member.

What is shared?

Host and networking information
Information required to connect to and communicate with other members of the cluster such as gossip, RPC, and API addresses.

Cluster member information
Information about each member in the cluster such as member ID, member role, and member status.

Alert topics and published alerts
As multiple Kapacitor instances publish to an alert topic, the published alerts are deduplicated across the cluster to prevent duplicate alerts being triggered.

“Standalone” alert handlers
Kapacitor Enterprise manages the replication of “Standalone” alert handlers so they only need to be defined on a single member. These should not be confused with Kapacitor tasks or TICKscripts.

kapacitor define-topic-handler ./handler-file.yml

What is not shared?

Tasks and task templates
TICKscripts are used to define Kapacitor tasks and task templates. Kapacitor does not replicate tasks defined on one member to other members of the cluster. Tasks must be defined manually on each member on which you intend them to run.

Inbound or transformed time series data (InfluxDB data)
All data written to InfluxDB is copied to members of your Kapacitor cluster where Kapacitor can manipulate, transform, and act on it. If a task transforms data, those changes are isolated to the that specific member unless they are written back to InfluxDB. Anything written to InfluxDB will propagate to other members of your Kapacitor cluster.

Eventual consistency

Members of an Kapacitor Enterprise cluster communicate in an eventually consistent way, meaning that information starts at one of the members, then spreads out to the rest of the members of the cluster. As a result, it is possible that two different members may have different information because one member has not learned the information yet. Under normal operation, the time for the information to spread to the entire cluster is minimal and thus the probability of being affected by out of sync data is low.

Troubleshooting

Out of sync members

If you find that information is frequently out of date on some of the members, try modifying the cluster configuration options. The following changes to the options will result in information spreading faster throughout the cluster at the expense of additional bandwidth: