You want to allocate resources for a QuasarDB cluster and need to have an estimate on the hardware requirements in terms of storage, cache and speed.
When determining the minimal cluster size, the dataset and how you interact with it is the driving force behind the decision making. We will be looking at the following variables:
- Ingestion speed
- Data retention period
- Querying patterns
Throughout this article we use a trading firm as an example, but the same strategy can be applied to any use case.
The average size (in bytes) of a row is the
While loading data into QuasarDB, the performance of similar insert operations slowly degrades over time. This corelates with the QuasarDB daemon showing a relatively high CPU usage.
QuasarDB organizes timeseries data in shards, which are dynamically created or updated when appropriate. When adding data to a shard, QuasarDB has to reindex all data in that shard, which is a complex operation that require a lot of CPU resources.
Option 1: tune your shard sizes
The most frequent scenario is that the shard size of a timeseries has been misconfigured. For example, if you have a shard size
Measuring performance can be useful for a number of reasons:
- You want accurate numbers on how long it takes for the QuasarDB daemon to insert a certain amount of data;
- You want to run a live analysis on where exactly the QuasarDB daemon is spending its time;
- You want on-line programmable response to certain actions inside QuasarDB.
For this purpose, the Linux build of the QuasarDB is instrumented with SystemTap probes. This document provides an example systemtap script you can use to monitor various latencies.
You are using a timeseries that has many different columns (more than 1,000). The insert performance of this timeseries is considerably worse than a timeseries with few columns.
QuasarDB implements MVCC transactions to ensure data consistency. References of these transactions are maintained in a map with O(log n) complexity. By using a large amount of columns, the upkeeping of th data structure that maintains the transaction references becomes a bottleneck.
Unlike other databases, QuasarDB isn't limited to a certain amount of timeseries and we encourage the use of many different timeseries; millions of timeseries is appropriate for