While loading data into QuasarDB, the performance of similar insert operations slowly degrades over time. This corelates with the QuasarDB daemon showing a relatively high CPU usage.
QuasarDB organizes timeseries data in shards, which are dynamically created or updated when appropriate. When adding data to a shard, QuasarDB has to reindex all data in that shard, which is a complex operation that require a lot of CPU resources.
Option 1: tune your shard sizes
The most frequent scenario is that the shard size of a timeseries has been misconfigured. For example, if you have a shard size of 1 day and have a sustained insert rate of 3000 data points every 3 seconds, by the end of the day there are almost 80 million data points in a single shard and need to be reindexed every insert operation.
We recommend a shard size of between 50,000 and 500,000 data points per shard. This can be achieved by either:
- Using a smaller shard size for your timeseries. In the example above, a shard size of 1 hour would be appropriate.
- Distributing your data over many different timeseries. This is a best practice for modeling your data with QuasarDB, and it is very common to model your data using millions of different timeseries.
Option 2: align client-side buffering with shard size
Another best practice is ensure that data is loaded into each shard exactly once, and is never updated. This is achieved by aligning the client-side flushing of the bulk insert with the server-side shard size. You can determine the shard boundaries with this simple operation:
timestamp % shard_size == 0
When align the flush of your bulk insert operation with this boundary, it ensures that the index operation of each shard is executed once and only once.