How to Approach High Availability and Disaster Recovery for SQL Server in Azure

Martin Wambui

June 26th, 2025

2136 Views

In today's always-on business environment, downtime isn’t just an inconvenience, it can mean lost revenue, reduced productivity, and even reputational damage. As such, planning a robust High Availability and Disaster Recovery (HADR) strategy is critical for any organization running SQL Server in Azure. Whether you're managing an Infrastructure as a Service (IaaS) deployment or leveraging Platform as a Service (PaaS), Azure provides a range of tools to meet your resilience goals. This article explores a structured approach to designing and implementing HADR for SQL Server in Azure.

1. Understand the Fundamentals: RTO and RPO

Before selecting a HADR solution, it's essential to define your Recovery Time Objective (RTO) and Recovery Point Objective (RPO):

RTO refers to how quickly your system must be restored after an outage to avoid significant business impact.
RPO defines how much data loss is acceptable in terms of time.

These objectives serve as the foundation of your HADR planning. They're determined by business requirements, application criticality, and risk appetite.

2. Choose Between IaaS and PaaS

Azure supports both IaaS (e.g., SQL Server on VMs) and PaaS (e.g., Azure SQL Database, Managed Instance) deployment models. Each has unique implications for HADR:

IaaS gives full control over SQL Server configuration and architecture. You're responsible for setting up clustering, backups, monitoring, and failover.
PaaS abstracts much of the complexity. HADR features are built-in, requiring minimal configuration while providing enterprise-grade reliability.

3. High Availability Options for IaaS (SQL Server on Azure VMs)

IaaS deployments require hands-on configuration but offer great flexibility. Below are the primary options:

Always On Availability Groups (AG)

An Availability Group in a single region

AGs use a Windows Server Failover Cluster (WSFC) under the hood and an internal load balancer for failover in Azure. They are optimal when you need database-level replication, fast failover, and flexible scaling of read-only replicas. However, objects outside the database like logins and jobs must be manually synchronized.

Protection Level: Database-level.
Failover: Automatic (synchronous), manual (asynchronous).
Replicas: Multiple readable secondaries.
Storage: No shared storage needed.
Use Case: Ideal for high availability and read-scale workloads.

Failover Cluster Instances (FCI)

FCIs maintain one copy of each database, which simplifies storage but introduces a single point of failure. They require AD DS, DNS, and a load balancer. FCIs can be paired with storage replication to enhance resilience.

A FCI deployment using Storage Spaces Direct

Protection Level: Instance-level.
Failover: Full stop/start of the SQL instance.
Replicas: One active node at a time.
Storage: Requires shared storage (Azure Shared Disks, iSCSI, etc.).
Use Case: Good for legacy apps or where instance-level protection is needed.

Log Shipping

Log shipping is based on backup, copy, and restore. While it lacks automation and real-time failover, it is highly tolerant of high-latency networks and is simple to implement.

Configuration showing backup, copy, & restore jobs

Protection Level: Database-level.
Failover: Manual.
Replicas: Warm standby server.
Storage: Independent storage.
Use Case: DR for less-critical databases or where low cost and simplicity are key.

Azure Site Recovery (ASR)

ASR replicates disk-level changes from one Azure region to another. It doesn't track SQL transactions but can offer a rapid recovery path in large-scale failures or ransomware scenarios.

Replication of disks configured to use Azure Site Recovery

Protection Level: VM-level.
Failover: Manual (or orchestrated).
Awareness: Not SQL-aware.
Use Case: For disaster recovery of entire VMs when database-level options aren't feasible.

4. High Availability for PaaS Deployments

Azure SQL Database and Azure SQL Managed Instance come with built-in HADR capabilities, simplifying deployment while meeting enterprise-grade requirements.

Auto-Failover Groups

Scope: SQL Database and Managed Instance.
Features: Multi-database failover, read-write and read-only listeners, automatic DNS redirection.
Failover: Automatic or manual.
Use Case: Seamless DR with minimal intervention.

This is the PaaS equivalent of an AG. Applications connect using a listener that automatically points to the active region. You can customize failover policies including data-loss grace periods.

Active Geo-Replication

Active Geo-Replication enables regionally distributed read-only replicas, which support read-heavy workloads and global applications. While it doesn’t offer automatic failover, failover is fast and supported via API or portal.

Screenshot of active Geo-Replication for Azure SQL Database.

Scope: Azure SQL Database only.
Features: Asynchronous replication to up to 4 readable secondaries.
Failover: Manual.
Use Case: Read-scale and cross-region disaster recovery.

Accelerated Database Recovery (ADR)

Scope: Enabled by default.
Features: Fast transaction rollback and crash recovery.
Use Case: Reduces recovery time after unexpected outages.

ADR uses a persisted version store to improve database availability, especially under long-running transactions. It also aggressively truncates the transaction log, improving performance and storage management.

Zone Redundancy

Scope: SQL Database (Premium/Business Critical) and Managed Instance.
Features: Automatic replication across Availability Zones.
Use Case: Protection against data center-level outages within a region.

By distributing replicas across zones, Zone Redundancy ensures continuity during power or hardware failures. It complements auto-failover groups or geo-replication in a layered DR strategy.

5. Backups: Your Last Line of Defense

No matter how solid your high availability (HA) or disaster recovery (DR) architecture is, it won’t save you from every situation. Human errors, ransomware attacks, or silent data corruption can bypass even the most resilient setups. That’s why backups are—and always will be—your ultimate fallback.

Think of them as your business continuity insurance: you hope you never need to use them, but when you do, they need to work.

IaaS Backup Strategies (SQL Server on Azure VMs)

For virtual machines running SQL Server, you get complete control over how and where your backups live:

SQL Native Backups: Schedule full, differential, and transaction log backups to disk. They support point-in-time restore and give you fine-grained control.
Backup to URL: Use Azure Blob Storage as a destination. It’s secure, offsite, and scalable.
Azure Backup: This platform service provides VM-level, application-consistent snapshots F
Automated SQL Backups (via IaaS Extension): Define policies in the Azure portal to automate SQL backups and retention management with minimal effort.

A key consideration here is your SQL recovery model. For point-in-time recovery, the FULL model is a must. Also, ensure backups don’t sit on ephemeral (temporary) disks that wipe on reboot.

PaaS Backup Strategies (Azure SQL Database & Managed Instance)

In PaaS environments, Microsoft handles backups for you—but that doesn’t mean you ignore them:

Automated Backups: Full backups are taken weekly, differentials every 12 hours, and transaction logs every 5–10 minutes.
Point-in-Time Restore (PITR): Restore any database to a specific time within the past 7–35 days.
Long-Term Retention (LTR): Keep backups for months or even years to meet compliance or audit needs.
Geo-Redundant Storage: By default, backups are stored in RA-GRS (read-access geo-redundant storage), giving you another layer of resilience.

You can’t schedule your own backups in PaaS—but you can initiate restores, monitor backup status, and configure retention via the Azure portal, PowerShell, or CLI.

One Rule: Test, Don’t Assume

A backup isn’t a backup until it’s tested. Routinely validate your backups by restoring them in a sandbox. At Armely, we recommend periodic drills to verify not just the files—but your team’s ability to recover under pressure.

Backups are your last defense in a worst-case scenario. Make them count.

6. Monitoring, Testing, and Application Readiness

Implement continuous monitoring using:

Azure Monitor and Service Health
DMVs like sys.dm_database_replica_states
PowerShell/CLI for status automation

Test your failover scenarios regularly. Also, ensure applications are equipped with retry logic and understand transient failures.

7. Hybrid and Multi-Region Architectures

Hybrid HADR configurations extend your resilience posture:

Use AGs with a secondary in Azure for DR from on-prem.
Use transactional replication from on-prem to PaaS.
Secure with ExpressRoute or VPN for low-latency and secure communication.

At Armely, we help you go beyond theory. From designing resilient architectures to implementing real-world recovery drills, we partner with you to build HADR strategies that work when it matters most.

Don’t wait for a failure to test your strategy engage Armely and get it right from the start.