Differences Between InfluxDB 0.8 and InfluxDB 0.9 | InfluxData Documentation Archive

There are significant breaking changes between the 0.8.x versions of InfluxDB and the 0.9.x versions. The API, the schema, the data store, and the clustering algorithm have all been significantly altered and in incompatible ways. The InfluxQL syntax has remained largely the same, although there are functions which exist only in one or the other version, and a few functions are implemented differently in the two versions.

This page aims to ease the transition to InfluxDB 0.9 for those more familiar with 0.8. It is not a comprehensive list of the differences, that would be prohibitively large. Less than half of the 0.8 code is retained in 0.9.0.

Schema

The most noticeable difference between InfluxDB 0.8 and InfluxDB 0.9 is the way series are defined and organized. In InfluxDB 0.8 a series had an explicit name that incorporated all the metadata about the value, similar to Graphite output. The recommended pattern for a series name in InfluxDB 0.8 is <tagName>.<tagValue>[.<tagName>.<tagValue>].<measurement>. For example: az.us-west-1.host.serverA.cpu.

In InfluxDB 0.9 a series is still defined by the measurement name and the complete tag set, but series are not explicitly defined at write time. Instead of writing a point to a particular series, a point is written with a measurement name and a set of tag key-value pairs. The recommended pattern in InfluxDB 0.9 is superficially similar: <measurement>,<tagName>=<tagValue>[,<tagName>=<tagValue>]. For example: cpu,az=us-west-1,host=serverA.

The real change is evident when querying. In InfluxDB 0.8, querying multiple series requires using a regular expression. For example, to query the cpu measurement where the az is us-west-1 for all hostnames, the syntax is SELECT * FROM /az.us-west-1.host..*.cpu$/ .... To query all series in the cpu measurement the syntax is SELECT * FROM /cpu$/ ....

In InfluxDB 0.9, queries operate directly on measurements, not on series. All series in a measurement are returned by default, unless filtered out by tags in the WHERE clause. For example, to query the cpu measurement where the az is us-west-1 for all hostnames, the syntax is SELECT * FROM cpu WHERE az='us-west-1' .... To query all series in the cpu measurement the syntax is SELECT * FROM cpu ....

The change means that as series incorporate additional tags all queries will continue to function as expected. Because all series are returned unless explicitly filtered, new tags do not have any effect on existing queries. The existing queries do not include the new tag and therefore will ignore its existence when selecting results. No longer does schema design need to include a prolonged session of predicting all future metadata and ensuring it can be represented within the desired regular expression syntax. In InfluxDB 0.9 filtering by tags is very efficient, whereas in InfluxDB 0.8 filtering by column in the WHERE clause is always an expensive operation.

In InfluxDB 0.8 a series has a name and one or more measured values. The values can be passed to functions or used in a GROUP BY clause. The series name is indexed but the values are not. In InfluxDB 0.9 a series has a measurement name, a tag set, and a field set. The measurement name plus the tag set are equivalent to the series name from InfluxDB 0.8. The field set in InfluxDB 0.9 is equivalent to the series values in InfluxDB 0.8.

When designing a schema for InfluxDB 0.9 is it important to understand the differences between tags and fields. Tags are indexed, fields are not. Both can be used as filters in the WHERE clause, but filtering by tags only requires a quick index lookup. Filtering by fields requires a full scan of all points on disk that match the other filters and is thus much less performant. In InfluxDB 0.9, fields can be passed into functions but tags cannot. Tags can be used in a GROUP BY clause but fields cannot. (In InfluxDB 0.8 series values – equivalent to fields – can be used in the GROUP BY clause.) In InfluxDB 0.9, measurement names and tags can be matched using regular expressions but regular expressions are not valid for matching field keys or values.

Joins

InfluxDB 0.8 allows both MERGE and JOIN operations between series. The MERGE operation returns a single aggregation of two or more series. The JOIN operation allows values from two different series to appear in the same select statement. JOIN is often used to perform mathematical operations on values from two different series, such as dividing two values to find a ratio. For example: SELECT errors_per_minute.value / pages_per_minute.value FROM errors_per_minute INNER JOIN pages_per_minute.

In InfluxDB 0.9 neither the MERGE nor JOIN operations are supported. The MERGE operation is no longer needed. All series within a measurement are automatically merged when queried, unless explicitly excluded by tags in the WHERE clause.

The lack of a JOIN operation means that fields from different measurements cannot be directly referenced in the same SELECT clause. It is no longer possible to run a query like SELECT errors_per_minute.value / pages_per_minute.value FROM errors_per_minute INNER JOIN pages_per_minute. It is possible to query more than one measurement at once using regular expressions. However, every field referenced in the query must be present in each measurement and cannot be filtered by measurement.

Data Retention

In InfluxDB 0.8 data retention is handled by shard groups. Shard groups include a duration for each individual shard within the group and when the duration is exceeded, the shard may be dropped. A given series writes to the first shard group whose regular expression matches the series name. Shards and shard groups are directly exposed in the Admin UI and can be manipulated by an end user.

In InfluxDB 0.9 it is no longer necessary to parse or define regular expressions to determine the retention of a particular point. Instead, data retention is handled by retention policies. Retention policies belong to databases and define the duration for each point written to the retention policy. When the duration is exceeded, the underlying file may be dropped. A point is written directly to a particular retention policy, declared at write time. If no retention policy is explicitly provided with the write, the point is written to the default retention policy. In InfluxDB 0.9 shards and shard groups are purely internal and have no user-facing features or settings.

Points With Identical Timestamps

In InfluxDB 0.8 a point is uniquely identified by the series name, timestamp, and sequence number. The sequence number is an internally generated ID that provides uniqueness in the event the series name and timestamp are identical.

In InfluxDB 0.9 a point is uniquely identified by the measurement name, full tag set, and the timestamp. Only one point can be stored for a given set of those values. If two points are written with identical measurement names, tag sets, and timestamps, the second point will silently overwrite the first. If you need to store multiple points with the same measurement name and timestamp, they must have different tag sets. This can be accomplished by adding a new tag with different values for each point that is otherwise identical. Uniqueness tags used in this manner will not significantly affect write or query performance.

Write Protocols

In InfluxDB 0.8 the primary write method is a JSON protocol. In InfluxDB 0.9 the JSON write protocol is deprecated and replaced by the line protocol.

The JSON write protocol was deprecated because JSON marshalling, unmarshalling, and parsing was consuming well over half of the total CPU used to process writes.

API Endpoints

InfluxDB 0.8 has multiple endpoints for the API, often encoding concepts such as the database and shard space in the HTTP endpoint. InfluxDB 0.9 simplifies the common API interactions, using a common endpoint /query for all queries, and /write for all writes, regardless of database or retention policy. The database and retention policy are passed as query string parameters in InfluxDB 0.9.

This makes the API easier to incorporate into libraries and other code, as it does not require any knowledge of the data structure to make valid calls to the API endpoints.

Command Line Interface

InfluxDB 0.9 introduces a command line interface (CLI) making it even easier to interact with the database. Rather than rely on the Admin UI or manually crafting HTTP requests, InfluxDB 0.9 users can send writes and queries in a pseudo-interactive shell, with query output available in multiple formats, including JSON, CSV, and columnar.

The CLI does not interact directly with the influxd process, it is a wrapper around the Go client and uses the same HTTP API as any other client or library.

Admin UI

In InfluxDB 0.8 the Administration User Interface supported a number of configuration and administration features, some of which could not be performed any other way. In InfluxDB 0.9 all database administration, configuration, and information is available via the HTTP API and the Admin UI has been substantially simplified.

InfluxQL re-written

As a result of the new schema, the query language was substantially overhauled. A number of functions available in InfluxDB 0.8 have not yet been re-implemented in InfluxDB 0.9, including MODE, HISTOGRAM, DIFFERENCE, and BOTTOM. Links go to the open issues to re-implement each function or behavior.

The query parser in InfluxDB 0.8 was implemented with a C version of Yacc and Bison. Re-writing the query parser in Go means the code is much more portable to systems other than Linux on i686, such as ARM or Windows builds.

Continuous Query Engine Redone

To incorporate the retention, clustering, and InfluxQL changes the continuous query implementation has been reworked. The continuous query engine in InfluxDB 0.9 still has rough edges that are due for significant work in the upcoming 0.9.5 release.

Clustering

InfluxDB 0.9 follows a hybrid model for maintaining availability of the data while guaranteeing consistency of the metadata. See CEO and co-founder Paul Dix’s clustering blog post for more details.

Data Storage Engine

In InfluxDB 0.8 users could choose from a selection of underlying storage engines, with RocksDB, LevelDB, HyperLevelDB, and LMDB as available options. In InfluxDB 0.9 the underlying storage engine is always BoltDB, a copy-on-write B+tree.

Selecting a storage engine written in Go simplifies the build process, making the code more portable to multiple architectures and operating systems. In addition, BoltDB greatly simplifies the overhead of internal file handles, a significant source of issues in the InfluxDB 0.8 version.