Knowledge Base: Performance Tuning

  • What are the hardware recommendations for my cluster?

    Summary

    You want to allocate resources for a QuasarDB cluster and need to have an estimate on the hardware requirements in terms of storage, cache and speed.

    Getting started

    When determining the minimal cluster size, the dataset and how you interact with it is the driving force behind the decision making. We will be looking at the following variables:

    • Ingestion speed
    • Data retention period
    • Querying patterns

    Throughout this article we use a trading firm as an example, but the same strategy can be applied to any use case.

    Row size

    The average size (in bytes) of a row is the

  • The performance of my inserts degrades over time

    Symptoms

    While loading data into QuasarDB, the performance of similar insert operations slowly degrades over time. This corelates with the QuasarDB daemon showing a relatively high CPU usage.

    Cause

    QuasarDB organizes timeseries data in shards, which are dynamically created or updated when appropriate. When adding data to a shard, QuasarDB has to reindex all data in that shard, which is a complex operation that require a lot of CPU resources.

    Resolution

    Option 1: tune your shard sizes

    The most frequent scenario is that the shard size of a timeseries has been misconfigured. For example, if you have a shard size

  • Measuring performance

    Summary

    Measuring performance can be useful for a number of reasons:

    • You want accurate numbers on how long it takes for the QuasarDB daemon to insert a certain amount of data;
    • You want to run a live analysis on where exactly the QuasarDB daemon is spending its time;
    • You want on-line programmable response to certain actions inside QuasarDB.

    For this purpose, the Linux build of the QuasarDB is instrumented with SystemTap probes. This document provides an example systemtap script you can use to monitor various latencies.

  • My timeseries is slow when I use many columns

    Symptoms

    You are using a timeseries that has many different columns (more than 1,000). The insert performance of this timeseries is considerably worse than a timeseries with few columns.

    Cause

    QuasarDB implements MVCC transactions to ensure data consistency. References of these transactions are maintained in a map with O(log n) complexity. By using a large amount of columns, the upkeeping of th data structure that maintains the transaction references becomes a bottleneck.

    Resolution

    Unlike other databases, QuasarDB isn't limited to a certain amount of timeseries and we encourage the use of many different timeseries; millions of timeseries is appropriate for