This is archived documentation for InfluxData product versions that are no longer maintained. For newer documentation, see the latest InfluxData documentation.
Kapacitor clusters provide a high-availability solution for capturing, manipulating, and acting on time-series data. Kapacitor Enterprise provides secure communication between InfluxDB and your Kapacitor servers as well as deduplication of alerts generated by the cluster.
This article covers the following:
- Terminology
- Kapacitor cluster architecture
- Setup a Kapacitor Enterprise cluster
- Remove a member from a Kapacitor cluster
- Cluster data ingestion
- Cluster awareness and replication
- Troubleshooting
Terminology
The following terms are used frequently throughout this guide.
- Member - An instance of the Kapacitor Enterprise process typically running in a host or container.
- Cluster - A set of members aware of each other.
Kapacitor cluster architecture
Kapacitor Enterprise clusters have only a single type of member, meaning every member of a cluster performs the same function. Essentially each member runs as a standalone Kapacitor instance, but is aware of and shares information with other members of the cluster.
When planning your cluster architecture, there are a few general rules to follow:
1. Directly accessible members
Members of the cluster must be directly accessible to other members of the cluster via TCP or UDP. Members must also be accessible via HTTP, HTTPS, or UDP from the InfluxDB instance or cluster from which data is received.
2. No load balancers
Members of a Kapacitor cluster should not be placed behind a load-balancer. InfluxDB needs direct access to all members of the cluster in order to copy written data to each. The Kapacitor subscriptions documentation outlines how InfluxDB sends data to Kapacitor via subscriptions.
3. Know the size of your cluster before starting
In the current release of Kapacitor, adding and removing members from a cluster dynamically can cause the cluster to get out of sync. To prevent synchronization issues, decide in advance how many members you want to run. You can add or remove members once a cluster is running, but this must be done correctly. See Removing a member from the cluster below for details.
Setup a Kapacitor Enterprise cluster
The basic installation steps for Kapacitor Enterprise cluster are:
- Configure Kapacitor Enterprise
- Start each member of the cluster
- Add members to a cluster
- Start using the Kapacitor cluster
Step 1: Configure Kapacitor Enterprise
Hostname configuration
In order for Kapacitor Enterprise members to communicate with each other,
they need to be able resolve each other’s addresses.
The hostname
setting
for each Kapacitor Enterprise member is the DNS or IP of the member.
All other Kapacitor Enterprise members need to be able to resolve and access that address.
If your network has members with different addresses for public and private networks,
there are configuration settings for the advertise-address
of each of the respective services.
Cluster configuration
Kapacitor Enterprise uses a gossip protocol
to maintain cluster membership and facilitate communication within the cluster.
In Kapacitor Enterprise’s kapacitor.conf
, the additional [cluster]
section
includes options specific to clusters.
These options define the network settings and tunable parameters for the gossip protocol.
In most cases the defaults are sufficient.
[cluster]
bind-address = ":9090"
advertise-address = ""
roles = ["worker"]
gossip-members = 3
gossip-interval = "0s"
gossip-sync-interval = "0s"
bind-address
The bind-address
is a host:port pair to which the cluster gossip communication can bind.
If only the port is specified, the host is inherited from the hostname
configuration.
The address is bound using both UDP and TCP protocols.
advertise-address
The advertise-address
is the address advertised to other members of the cluster for this member.
Defaults to provided bind-address
if empty.
gossip-members
The gossip-members
setting is the number of neighboring members to whom gossip messages are sent.
In the configuration file, the default setting of gossip-members = 0
results in a default
value designed for use within a typical LAN network (currently 3
).
A higher count results in faster convergence but also increases network bandwidth.
gossip-interval
The gossip-interval
is the time between gossip messages.
In the configuration file, the default setting of gossip=interval = 0s
results in a
default value designed for use within a typical LAN network (currently 200ms
).
A shorter interval means faster convergence but increased network bandwidth.
gossip-sync-interval
The gossip-sync-interval
is the time between full TCP state sync of the cluster gossip state.
In the configuration file, the default setting of gossip-sync-interval = "0s"
results in a default
value designed for use within a typical LAN network (currently 30s
).
A shorter interval means faster convergence but more network bandwidth.
Alerting configuration
Kapacitor Enterprise deduplicate alerts generated from tasks running on multiple members.
The [alert]
configuration section includes the following options:
[alert]
redundancy = 0
delay-per-member = "10s"
full-sync-interval = "5m0s"
redundancy
The redundancy
setting is the number of redundant servers to be assigned ownership
of each alert topic. The default value is 0
.
Set this to however many servers on which you plan to replicate tasks.
For example, if you plan to run each task on 2 members, set redundancy = 2
.
delay-per-member
The delay-per-member
setting indicates the duration, or amount of time,
each member should be given to process an event.
If the specified duration elapses without receiving notification of event completion,
the next member in line assumes responsibility of the event.
The default is 10s
. Decreasing the value reduces the long-tail latency of alerts with
the cost of a high probability of duplicated alerts.
full-sync-interval
The full-sync-interval
is the duration, or time period, in which full state is synced.
The duration value specifies an upper bound on the amount of drift that can occur.
The default value is 5m0s
.
Increasing
redundancy
means more work is duplicated within the cluster and decreases the likelihood of a failure causing an alert to be lost. An alert is only dropped if all redundant members handling the alert fail together. Increasing thedelay-per-member
can reduce the probability of duplicate alerts in the case of a partial failure, but it also increases the duration at which an alert could arrive late.
InfluxDB configuration
The [influxdb]
section of the kapacitor.conf
has a subscription-mode
option
which should be set to server
when running Kapacitor Enterprise as a cluster.
This allows each server within the cluster to create its own subscription to InfluxDB
through which it receives copies of all written data.
Step 2: Start each member of the cluster
Once Kapacitor Enterprise is installed and the necessary configuration settings have been set, start Kapacitor on each member of your cluster.
The following example walks through setting up a Kapacitor cluster with two members: kapacitor-1
and kapacitor-2
.
This process can be easily extended to more than two members.
Start Kapacitor on kapacitor-1
:
kapacitor-1$ kapacitord -config /path/to/kapacitor-1/kapacitor.conf
Use the kapacitorctl member list
command to list the members of the cluster.
The list will have only one entry for kapacitor-1
since no other members have
been added to the cluster yet.
kapacitor-1$ kapacitorctl member list
State: uninitialized
Cluster ID: 876ddfb4-1879-4f40-87e2-4080c04d3096
Local Member ID: f74f3547-efaf-4e6e-8b05-fb12b19f8287
Member ID Gossip Address RPC Address API Address Roles Status
f74f3547-efaf-4e6e-8b05-fb12b19f8287 kapacitor-1:9090 kapacitor-1:9091 kapacitor-1:9092 worker alive
The output includes addresses necessary for adding this Kapacitor server to a cluster. Each address exposes a service. Below is a table outlining the purpose for each service.
Service | Public/Private | Default Port | Network Protocol | Description |
---|---|---|---|---|
Gossip | Private | 9090 | TCP and UDP | Kapacitor uses a gossip protocol to maintain cluster membership and otherwise communicate. |
RPC | Private | 9091 | TCP | Kapacitor uses the RPC service for peer to peer communication between members. |
API | Public | 9092 | TCP | Kapacitor exposes an HTTP REST API, all external systems communicate with Kapacitor via this service. |
Services marked “Private” do not need to be exposed to any other systems; only to other Kapacitor members. “Private” means private to the cluster.
Start the next member
Start Kapacitor on the second member, kapacitor-2
.
kapacitor-2$ kapacitord -config /path/to/kapacitor-2/kapacitor.conf
Use the kapacitorctl member list
command to view the information for this Kapacitor member.
kapacitor-2$ kapacitorctl member list
State: uninitialized
Cluster ID: 9acd33e6-ed88-4601-98df-6b73c1c78427
Local Member ID: 13eeefdd-41b5-453f-928e-cb9c55fd2a5d
Member ID Gossip Address RPC Address API Address Roles Status
13eeefdd-41b5-453f-928e-cb9c55fd2a5d kapacitor-2:9090 kapacitor-2:9091 kapacitor-2:9092 worker alive
Step 3: Add members to a cluster
With both kapacitor-1
and kapacitor-2
running independently, add them together to form a single cluster.
When adding members to a cluster, you must have the RPC address of the member being added.
This is included in the kapacitorctl member list
output.
On kapacitor-1
, use the kapacitorctl member add
command and kapacitor-2
’s RPC address to add kapacitor-2
to the cluster.
kapacitor-1$ kapacitorctl member add kapacitor-2:9091
kapacitor-1
will initiate a connection to kapacitor-2
over the RPC service and join it to the cluster.
Use the kapacitorctl member list
command to check that both members know about each other.
Member list from kapacitor-1
kapacitor-1$ kapacitorctl member list
State: initialized
Cluster ID: 876ddfb4-1879-4f40-87e2-4080c04d3096
Local Member ID: f74f3547-efaf-4e6e-8b05-fb12b19f8287
Member ID Gossip Address RPC Address API Address Roles Status
f74f3547-efaf-4e6e-8b05-fb12b19f8287 kapacitor-1:9090 kapacitor-1:9091 kapacitor-1:9092 worker alive
13eeefdd-41b5-453f-928e-cb9c55fd2a5d kapacitor-2:9090 kapacitor-2:9091 kapacitor-2:9092 worker alive
Member list from kapacitor-2
kapacitor-2$ kapacitorctl member list
State: initialized
Cluster ID: 876ddfb4-1879-4f40-87e2-4080c04d3096
Local Member ID: f74f3547-efaf-4e6e-8b05-fb12b19f8287
Member ID Gossip Address RPC Address API Address Roles Status
f74f3547-efaf-4e6e-8b05-fb12b19f8287 kapacitor-1:9090 kapacitor-1:9091 kapacitor-1:9092 worker alive
13eeefdd-41b5-453f-928e-cb9c55fd2a5d kapacitor-2:9090 kapacitor-2:9091 kapacitor-2:9092 worker alive
Notice that the cluster state is initialized
and cluster IDs are the same for both members.
Repeat the process above to add additional members to the cluster.
Step 4: Start using the Kapacitor cluster
Kapacitor clustering is designed to duplicate task work while deduplicating alerts generated from tasks. If one member fails, other members in the cluster running the task will continue to generate alerts. To leverage Kapacitor Enterprise’s high-availability features, define tasks on multiple members that publish to a topic. Then define alert handlers that publish alerts from that topic. The Alerts in a Kapacitor clusters documentation provides more information.
Remove a member from a Kapacitor cluster
Members can be removed from a cluster as needed, however they should not just be shut down without manual intervention.
Doing so will cause synchronization issues throughout the remaining members of the cluster.
To remove a member, us the kapacitorctl member remove
command.
Using the example cluster setup above,
to remove kapacitor-2
, get the Member ID of kapacitor-2
from the kapacitorctl member list
output.
Run the kapacitorctl member remove
command on either kapacitor-1
or kapacitor-2
:
kapacitor-1$ kapacitorctl member remove 13eeefdd-41b5-453f-928e-cb9c55fd2a5d
kapacitor-2
will enter an uninitialized state with a new cluster ID.
kapacitor-2$ kapacitorctl member list
State: uninitialized
Cluster ID: bcaf2098-f79a-4a62-96e4-e2cf83441561
Local Member ID: 13eeefdd-41b5-453f-928e-cb9c55fd2a5d
Member ID Gossip Address RPC Address API Address Roles Status
13eeefdd-41b5-453f-928e-cb9c55fd2a5d kapacitor-2:9090 kapacitor-2:9091 kapacitor-2:9092 worker alive
If decommissioning the Kapacitor server, be sure to remove its subscription from InfluxDB. InfluxDB does not know if the server will come back or not and will continue to attempt to send data to the removed member unless its subscription is manually removed.
Cluster data ingestion
The primary methods for writing data to Kapacitor are using
subscriptions or directly
using the Kapacitor API /kapacitor/v1/write
endpoint.
Sequentially or in parallel
When writing directly to the Kapacitor API, send requests to all nodes in your Kapacitor Enterprise cluster. You can send data in parallel (to all nodes at once) or sequentially (to one node after the other). Either method is acceptable, but when writing sequentially, ensure each point has the same timestamp when writing to each node and that write requests arrive quickly. Each node will process data at slightly different times, but if points have the same timestamp, each Kapacitor node will compute the same result.
The delay-per-member
configuration option is a time interval
that lets nodes process data at slightly different times.
As long as the write requests to the first node and the last node are no more than
the delay-per-member
interval, you will not have any issues.
The default value of delay-per-member
is 10s
.
You can increase the value, but this will delay alert events by that same amount.
Cluster awareness and replication
The current release of Kapacitor Enterprise is only partially “cluster-aware,” meaning that some commands and information on the cluster will be automatically replicated throughout the cluster while other commands and information are not shared or need to be explicitly run on each member.
What is shared?
Host and networking information
Information required to connect to and communicate with other members of the cluster
such as gossip, RPC, and API addresses.
Cluster member information
Information about each member in the cluster such as member ID, member role, and member status.
Alert topics and published alerts
As multiple Kapacitor instances publish to an alert topic, the published alerts
are deduplicated across the cluster to prevent duplicate alerts being triggered.
“Standalone” alert handlers
Kapacitor Enterprise manages the replication of “Standalone” alert handlers
so they only need to be defined on a single member.
These should not be confused with Kapacitor tasks or TICKscripts.
kapacitor define-topic-handler ./handler-file.yml
What is not shared?
Tasks and task templates
TICKscripts are used to define Kapacitor tasks and task templates.
Kapacitor does not replicate tasks defined on one member to other members of the cluster.
Tasks must be defined manually on each member on which you intend them to run.
Inbound or transformed time series data (InfluxDB data)
All data written to InfluxDB is copied to members of your Kapacitor cluster where
Kapacitor can manipulate, transform, and act on it.
If a task transforms data, those changes are isolated to the that specific member
unless they are written back to InfluxDB.
Anything written to InfluxDB will propagate to other members of your Kapacitor cluster.
Eventual consistency
Members of an Kapacitor Enterprise cluster communicate in an eventually consistent way, meaning that information starts at one of the members, then spreads out to the rest of the members of the cluster. As a result, it is possible that two different members may have different information because one member has not learned the information yet. Under normal operation, the time for the information to spread to the entire cluster is minimal and thus the probability of being affected by out of sync data is low.
Troubleshooting
Out of sync members
If you find that information is frequently out of date on some of the members,
try modifying the cluster
configuration options.
The following changes to the options will result in information spreading faster
throughout the cluster at the expense of additional bandwidth:
- Increase the
gossip-members
option. - Decrease the
gossip-interval
option. - Decrease the
gossip-sync-interval
option.