Migrating from Self-Managed ClickHouse to ClickHouse Cloud Using Backup Commands
Overview
There are two primary methods to migrate from self-managed ClickHouse (OSS) to ClickHouse Cloud:
- Using the
remoteSecure()function in which data is directly pulled/pushed. - Using
BACKUP/RESTOREcommands via cloud object storage
This migration guide focuses on the
BACKUP/RESTOREapproach and offers a practical example of migrating a database or full service in open source ClickHouse to Cloud via an S3 bucket.
Prerequisites
- You have Docker installed
- You have an S3 bucket and IAM user
- You are able to create a new service ClickHouse Cloud service
To make the steps in this guide easy to follow along with and reproducible, we'll use one of the docker compose recipes for a 2 shards 2 replicas ClickHouse cluster.
It is necessary to be using a ClickHouse cluster rather than a single instance as you will need to convert
tables of MergeTree engine type to ReplicatedMergeTree.
If you are wanting to back up tables from a single instance, consider following the steps
in "Migrating between self-managed ClickHouse and ClickHouse Cloud using remoteSecure"
OSS preparation
- Clone the examples repository to your local machine
- From your terminal cd into
examples/docker-compose-recipes/recipes/cluster_2S_2R - From the root of the
cluster_2S_2Rfolder:
Make sure Docker is running. You can now start the ClickHouse cluster:
You should see:
From a new terminal window at the root of the folder run the following command to connect to the first node of the cluster:
For the purposes of this guide, we'll create one of the tables from our sample datasets. Follow the first two steps of the New York taxi data guide
Run the following commands to create a new database and insert data from an S3 bucket into a new table:
In the CREATE TABLE DDL statement we specified the table engine type as MergeTree, however
ClickHouse Cloud works with SharedMergeTree.
When restoring a backup, ClickHouse automatically converts ReplicatedMergeTree to SharedMergeTree, but it is necessary
for us to first convert any MergeTree tables to ReplicatedMergeTree foe this to work.
Run the following command to DETACH the table.
Then attach it as replicated:
Finally, restore the replica metadata:
Check that it was converted to ReplicatedMergeTree:
You are now ready to proceed with setting up your Cloud service in preparation for later restoring a backup from your S3 bucket.
Cloud preparation
You will be restoring your data into a new Cloud service. Follow the steps below to create a new Cloud service.
-
Create a new service
- Choose your desired region and configuration, then click
Create service
- Open SQL console
Next you will need to create an access role. These steps are detailed in the guide "Accessing S3 data securely". Follow the steps in that guide to obtain an access role ARN.
In "How to create an S3 bucket and IAM role" you created a policy for your S3 bucket. You'll now need to add the ARN you obtained in in "Accessing S3 data securely" from the output of the created stack to your bucket policy.
Your updated policy for the S3 bucket will look something like this:
Specifying both the user ARN and the ClickHouse Cloud access user role ensures that you will be able to both backup to the S3 bucket and later restore from it using the Cloud access role.
Taking the backup (On OSS)
To make a backup of a single database, run the following command from clickhouse-client connected to your OSS deployment:
Replace BUCKET_URL, KEY_ID and SECRET_KEY with your own AWS credentials.
The guide "How to create an S3 bucket and IAM role"
shows you how to obtain these if you do not yet have them.
If everything is correctly configured you will see a response similar to the one below containing a unique id assigned to the backup and the status of the backup.
If you check your previously empty S3 bucket you will now see some folders have appeared:
If you are performing a full migration then you can run the following command to backup the entire server:
The command above backups up:
- All user databases and tables
- User accounts and passwords
- Roles and permissions
- Settings profiles
- Row policies
- Quotas
- User-defined functions
If you are using a different CSP, you can use the TO S3() (for both AWS and GCP) and TO AzureBlobStorage() syntax.
For very large databases, consider using ASYNC to run the backup in the background:
The backup id can then be used to monitor the progress of the backup:
It is also possible to take incremental backups. For more detail on backups in general, the reader is referred to the documentation for backup and restore.
Restore to ClickHouse Cloud
To restore a single database run the following query from your Cloud service, substituting your AWS credentials below,
setting ROLE_ARN equal to the value which you obtained as output of the steps detailed
in "Accessing S3 data securely"
You can do a full service restore in a similar manner:
If you now run the following query in Cloud you can see that the database and table have been successfully restored on Cloud: