
12 May Complex SQL Server Migration? Distributed Availability Groups Could Be Your Answer
Upgrading SQL Server is just one of those things you’ll have to do every once in a while, as a DBA. Whether it’s because a version is nearing its end-of-support date, you’re refreshing hardware, beefing up security, or chasing better performance, it always involves a careful balancing act. You need to minimise risk, keep downtime to an absolute minimum, and make sure you’re using your resources effectively.
At the recent SeaSQL Saturday event in the Netherlands, our own Christophe Platteeuw hosted a session where he showed how Distributed Availability Groups (DAGs) can be a real game-changer for low-downtime migrations. Even when you’re dealing with different SQL Server versions, clusters, or domains!
Traditional migration techniques: the usual suspects
Before we get into DAGs, let’s quickly look at some of the more traditional ways to migrate SQL Server.
For standalone servers
If you’re dealing with smaller, isolated SQL Server environments, you’ve got a few straightfoward options. Each of these has its pros and cons when it comes to complexity, downtime, and how much you can automate.
- In-place upgrade: this means upgrading the SQL instance on the same hardware. It’s rarely used in production because it means complete downtime and there’s no easy way to roll back if things go wrong.
- Detach/attach or backup/restore: These are handy for smaller databases with not too much activity. Tools like Brent Ozar’s sp_DatabaseRestore and dbatools can help automate these processes.
- Log shipping: This offers a bit more continuity with a delayed cutover. It’s slower, but it lets you test things beforehand, and you can have readable secondary replicas.
For high-availability environments
When you’re working with clustered or Always On setups, migrations get a bit more involved:
- OS upgrade (SQL version stays the same): you add new nodes to your Windows Server Failover Cluster (WSFC), fail over your SQL Server resources, and then remove the old nodes. This is good for refreshing your infrastructure with minimal impact on applications, but you can usually only upgrade to the “next” version of Windows.
- SQL upgrade (same or new OS): This needs a careful sequence: add new nodes, join them to the Availability Group (AG), fail over, and then retire the old nodes. It involves some downtime and operational overhead, but you avoid a full rebuild. You also can’t always jump multiple SQL Server versions at once this way.
Distributed Availability Groups: a smart strategy for enterprise migrations
So, what makes Distributed Availability Groups (DAGs) different and why are they a good option for bigger, more complex migrations? Let’s start by looking at what they are.
What exactly is a Distributed Availability Group?
A Distributed Availability Group, or DAG, is a special type of availability group that was introduced in SQL Server 2016. Unlike standard availability groups, which work within a single Windows Server Failover Cluster (WSFC), a DAG connects two separate availability groups. These AGs can be on different clusters, in different domains, or even in different geographical locations.
DAGs were originally designed for things like disaster recovery and replicating data to the cloud (like Azure SQL). As it turns out, they also provide a great way to upgrade large SQL Server installations without a lot of downtime.
Why use DAGs for your migration?
DAGs offer some significant advantages for migrations:
- Near-zero downtime: because the data is continuously synchronised, the actual cutover time can be cut down to just seconds.
- Flexibility: you can migrate across different hardware, operating system versions, SQL Server versions (even skipping versions), and Windows domains.
- Parallel validation: the new environment runs alongside the old one, so you can test everything thoroughly before the final switch.
- Rollback safety: if something goes wrong during or after the cutover, your applications can reconnect to the original environment without losing data.
Technical deep dive: the migration process
Let’s get into the technical details. Assume you have the following setup:
- A production AG (let’s call it OTM-SQL19AG) running AdventureWorks on your old cluster.
- A new cluster set up with a new AG (e.g., OTM-SQL22AG) and a test database.
- All your server logins have been synchronised (using sp_help_revlogin, for example).
- Both clusters and their availability groups are working fine individually.
Here’s how you’d proceed:
-
Configure the endpoint
To allow the clusters to talk to each other, each SQL Server service account needs CONNECT permission on the remote endpoint.
— On SQL 2019 nodes:
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [OTOMIZER\gMSA-SQL22svc$];
— On SQL 2022 nodes:
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [OTOMIZER\gMSA-SQL19svc$];
-
Create the Distributed Availability Group
Execute this on the primary replica of your old (current production) AG:
CREATE AVAILABILITY GROUP [MyDistributedAG]
WITH (DISTRIBUTED)
AVAILABILITY GROUP ON
‘OTM-SQL19AG’ WITH (
LISTENER_URL = ‘TCP://OTM-SQL19LS.otomizer.local:5022’,
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL,
SEEDING_MODE = MANUAL
),
‘OTM-SQL22AG’ WITH (
LISTENER_URL = ‘TCP://OTM-SQL22LS.otomizer.local:5022’,
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL,
SEEDING_MODE = MANUAL
);
-
Join the DAG from the forwarder
Now, execute this on the primary replica of your new cluster’s AG:
ALTER AVAILABILITY GROUP [MyDistributedAG]
JOIN
AVAILABILITY GROUP ON
‘OTM-SQL19AG’ WITH (
LISTENER_URL = ‘TCP://OTM-SQL19LS.otomizer.local:5022’,
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL,
SEEDING_MODE = MANUAL
),
‘OTM-SQL22AG’ WITH (
LISTENER_URL = ‘TCP://OTM-SQL22LS.otomizer.local:5022’,
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL,
SEEDING_MODE = MANUAL
);
-
Seed the database manually (especially if you’re skipping versions)
You’ll likely need to seed the database manually.
— Backup from the old AG’s primary:
BACKUP DATABASE [AdventureWorks]
TO DISK = N’\\otomizer.local\DFS\SqlBackup$\AdventureWorks.bak’
WITH COPY_ONLY, INIT;
— Restore on the new AG’s primary and secondary replicas (with NORECOVERY):
RESTORE DATABASE [AdventureWorks]
FROM DISK = N’\\otomizer.local\DFS\SqlBackup$\AdventureWorks.bak’
-
Join the database to the New AG
On both servers in your new AG:
ALTER DATABASE [AdventureWorks]
SET HADR AVAILABILITY GROUP = [OTM-SQL22AG];
-
Switch to synchronous mode
Once seeding is complete and data is flowing, switch to synchronous commit mode on both clusters for the DAG:
— On both clusters (primary replicas of each AG in the DAG):
ALTER AVAILABILITY GROUP [MyDistributedAG]
MODIFY AVAILABILITY GROUP ON
‘OTM-SQL19AG’ WITH (AVAILABILITY_MODE = SYNCHRONOUS_COMMIT);
ALTER AVAILABILITY GROUP [MyDistributedAG]
MODIFY AVAILABILITY GROUP ON
‘OTM-SQL22AG’ WITH (AVAILABILITY_MODE = SYNCHRONOUS_COMMIT);
-
Final cutover
Once sys.dm_hadr_database_replica_states (a Dynamic Management View, or DMV) confirms everything is synchronised and all client activity is stopped, you can force the failover. Important: The FORCE_FAILOVER_ALLOW_DATA_LOSS option is the only way to fail over a DAG. So, be absolutely sure to check the status of the last Log Sequence Numbers (LSNs) using this DMV before proceeding.
— On the old AG’s primary (disconnects it from the DAG):
ALTER AVAILABILITY GROUP [MyDistributedAG] SET (ROLE = SECONDARY);
— On the new AG’s primary (completes the migration):
ALTER AVAILABILITY GROUP [MyDistributedAG] FORCE_FAILOVER_ALLOW_DATA_LOSS;
Post-migration tasks: the finishing touches
Once the cutover is done, don’t forget these steps:
- Update your application connection strings or DNS aliases to point to the new environment.
- Raise the database compatibility level on the new database to match your new SQL Server version.
- Once you’re confident everything is working perfectly, remove the distributed AG and decommission the old environment.
- Keep a close eye on application logs and the SQL Server error log for any unusual behaviour.
Best practices and lessons learned
To wrap things up, here are a few tips from Christophe’s experience:
- Setting up CNAME records in DNS for your listeners beforehand makes future migrations much smoother and more predictable.
- DAGs offer rollback safety, but only if you’ve actually tested and validated your fallback plan. Make sure you know how your applications will behave.
- Cutovers, even very short ones, can be disruptive if you don’t coordinate with the teams who own the applications.
DAGs: a powerful tool for modern SQL server environments
Distributed Availability Groups are a big step forward. They give you a way to handle complex migrations with a high degree of control and safety. You’ll need careful planning and a good technical understanding, but the benefits are hard to ignore. With DAGs in your toolkit, you’re much better equipped to modernise your SQL Server environments, no matter how large or complex they are.
For the full slide deck and example scripts from Christophe’s presentation, visit his GitHub page.