Schema Design

This is archived documentation for InfluxData product versions that are no longer maintained. For newer documentation, see the latest InfluxData documentation.

Every InfluxDB use case is special and your schema will reflect that uniqueness. There are, however, general guidelines to follow and pitfalls to avoid when designing your schema.

Encouraged Schema Design

In no particular order, we recommend that you:

  • Encode meta data in tags

    Tags are indexed and fields are not indexed. This means that queries on tags are more performant than those on fields.

    In general, your queries should guide what gets stored as a tag and what gets stored as a field:

    • Store data in tags if they’re commonly-queried meta data
    • Store data in tags if you plan to use them with GROUP BY()
    • Store data in fields if you plan to use them with an InfluxQL function
    • Store data in fields if you need them to be something other than a string - tag values are always interpreted as strings
  • Avoid using InfluxQL Keywords as identifier names

    This isn’t necessary, but it simplifies writing queries; you won’t have to wrap those identifiers in double quotes. Identifiers are database names, retention policy names, user names, measurement names, tag keys, and field keys. See InfluxQL Keywords for words to avoid.

    Note that you will also need to wrap identifiers in double quotes in queries if they contain characters other than [A-z,_].

Discouraged Schema Design

In no particular order, we recommend that you:

  • Don’t have too many series

    See Hardware Sizing Guidelines for series cardinality recommendations based on your hardware.

    Tags that specify highly variable information like UUIDs, hashes, and random strings can increase your series cardinality to uncomfortable levels. If you need that information in your database, consider storing the high-cardinality data as a field rather than a tag (note that query performance will be slower).

  • Don’t differentiate data with measurement names

    In general, taking this step will simplify your queries. InfluxDB queries merge data that fall within the same measurement; it’s better to differentiate data with tags than with detailed measurement names.

    Example:

    Schema 1Schema 2
    Measurement: blueberries.field-1.region-northMeasurement: blueberries; Tags: field = 1 and region = north
    Measurement: blueberries.field-2.region-midwestMeasurement: blueberries; Tags: field = 2 and region = midwest

    Assume that each measurement contains a single field key called value. The following queries calculate the average of value across all fields and all regions. Notice that, even at this small scale, this is harder to do under Schema 1.

    Schema 1

    > SELECT mean(value) FROM /^blueberries/
    name: blueberries.field-1.region-north
    --------------------------------------
    time                    mean
    1970-01-01T00:00:00Z    444
    
    name: blueberries.field-2.region-midwest
    ----------------------------------------
    time                    mean
    1970-01-01T00:00:00Z    33766.666666666664
    

    Then calculate the mean yourself.

    Schema 2

    > SELECT mean(value) FROM blueberries
    name: blueberries
    -----------------
    time                    mean
    1970-01-01T00:00:00Z    17105.333333333332
    
  • Don’t put more than one piece of information in one tag

    Similar to the point above, taking this step will simplify your queries. It will reduce your need for regular expressions.

    Example:

    Tagset 1Tagset 2
    location = field-1.region-northfield = 1 and region = north
    location = field-2.region-northfield = 2 and region = north
    location = field-2.region-midwestfield = 2 and region = midwest

    Assume that each tag set falls in the measurement blueberries and is associated with a field called value. The following queries calculate the average of value for blueberries that fall in the north. While both queries are relatively simple, you can imagine that the regex could get much more complicated if Schema 1 contained a more complex tag value.

    Schema 1

    > SELECT mean(value) FROM blueberries WHERE location =~ /north/
    

    Schema 2

    > SELECT mean(value) FROM blueberries WHERE region = 'north'
    
  • Don’t use the same name for a field key and tag key

    You won’t be able to query the tag key if the tag key is the same as a field key in your schema. Be sure to differentiate your tag keys and field keys.

    See GitHub Issue [#6519](https://github.com/influxdata/influxdb/issues/6519) for more information.