Copying data between two QuasarDB clusters

Summary

You want to copy data between QuasarDB clusters. This is a common use case that can be required when:

  • Upgrading a QuasarDB cluster version;
  • Safeguarding against data loss or data corruption by creating backups on a secondary cluster;
  • Creating a snapshot copy for use in a staging or development environment.

While QuasarDB does not provide native cluster-to-cluster data migration, we do provide you with the tools to do this yourself.

Choose a strategy

There are two different strategies you can take when copying data:

Clone snapshot

This is the simplest approach, where the entire contents of the primary cluster is copied to the secondary cluster. As the data copy takes place, new writes to the primary cluster are not propagated.

This approach is recommended where it is feasible to temporary take the cluster offline, or complete synchronization between primary and secondary is not required (such as snapshotting a production cluster to a development enviornment).

Copy-on-write snapshot

This process is more involved, but offers the safety of knowing that the data between the primary and secondary cluster is completely in sync.

This approach is suitable for situations where consistency is important, such as upgrading a QuasarDB cluster.

Clone snapshot

A clone snapshot is created using the following strategy:

1. Establish a connection with primary cluster and secondary cluster
2. Create identical timeseries on primary
3. Start streaming bulk read on primary timeseries
4. Write rows to secondary timeseries

An example of what the code could look like is provided with our Python API. You can find it at https://github.com/bureau14/qdb-api-python/blob/master/examples/cluster_sync.py

This will take some time to complete, depending upon the size of your dataset. After the process has completed, a single consistent snapshot will be available on your secondary cluster.

Copy-on-write snapshot

A copy-on-write snapshot involves modifying your production code to stream writes to both primary and secondary cluster. When you combine this with a clone snapshot, your primary and secondary cluster will always by in sync and you can safely fail over between these.

A copy-on-write snapshot is created using the following strategy:

1. Establish a connection to both primary and secondary cluster in production code
2. Send all writes to both clusters at all time
3. Create a clone snapshot as documented above

After completing these steps, you will have a secondary cluster which is updated in realtime.

Was this article helpful?
0 out of 0 found this helpful