The performance of my inserts degrades over time

Symptoms

While loading data into QuasarDB, the performance of similar insert operations slowly degrades over time. This corelates with the QuasarDB daemon showing a relatively high CPU usage.

Cause

QuasarDB organizes timeseries data in shards, which are dynamically created or updated when appropriate. When adding data to a shard, QuasarDB has to reindex all data in that shard, which is a complex operation that require a lot of CPU resources.

Resolution

Option 1: tune your shard sizes

The most frequent scenario is that the shard size of a timeseries has been misconfigured. For example, if you have a shard size of 1 day and have a sustained insert rate of 3000 data points every 3 seconds, by the end of the day there are almost 80 million data points in a single shard and need to be reindexed every insert operation.

We recommend a shard size of between 50,000 and 500,000 data points per shard. This can be achieved by either:

  • Using a smaller shard size for your timeseries. In the example above, a shard size of 1 hour would be appropriate.
  • Distributing your data over many different timeseries. This is a best practice for modeling your data with QuasarDB, and it is very common to model your data using millions of different timeseries.

Option 2: align client-side buffering with shard size

Another best practice is ensure that data is loaded into each shard exactly once, and is never updated. This is achieved by aligning the client-side flushing of the bulk insert with the server-side shard size. You can determine the shard boundaries with this simple operation:

timestamp % shard_size == 0

When align the flush of your bulk insert operation with this boundary, it ensures that the index operation of each shard is executed once and only once.

References

Was this article helpful?
0 out of 0 found this helpful