- 01What a SQL Server health check should cover
- 02When to run a SQL Server health check
- 03How to start a SQL Server health check
- 04SQL Server health check for version, uptime, and configuration
- 05SQL Server health check for databases and files
- 06SQL Server health check for tempdb and storage latency
- 07SQL Server health check for backups and restore history
- 08SQL Server health check for SQL Agent jobs and maintenance
- 09SQL Server health check for waits, blocking, and active requests
- 10SQL Server health check for error logs and corruption signals
- 11SQL Server health check for access, security, and ownership
- 12SQL Server health check for HA, DR, and monitoring coverage
- 13What not to change after a SQL Server health check
- 14When to request a SQL Server health audit
sql server hub / health check guide
SQL Serverhealth check guide
A SQL Server health check should show what is healthy, what is weak, what is missing, and what should be fixed first.
Use this guide to check version, configuration, files, tempdb, backups, restores, jobs, waits, blocking, error logs, access, HA, and monitoring with concrete SQL Server output.
What a SQL Server health check should cover
A SQL Server health check should move from context to risk. First understand the instance, then check recovery and routine operations, then read workload symptoms, then decide what needs action.
It is not a generic score, and it is not a full performance tuning engagement. The useful result is a short findings list backed by current SQL Server output, with enough context to decide what is healthy, what is weak, what is missing, and what needs a planned change.
Baseline state
Start with the facts that make every later reading useful: SQL Server version, uptime, edition, configuration, databases, file layout, growth settings, and tempdb shape.
Recovery and routine operations
Check whether backups, restore history, SQL Agent jobs, CHECKDB, maintenance, cleanup, and monitoring are actually running in a way production can rely on.
Current workload and failure clues
Use waits, active requests, blocking, file latency, tempdb use, error logs, and suspect pages to see where the server is currently under pressure or quietly failing.
Access, HA, and follow-up
Finish by checking privileged access, ownership, HA/DR state, alert coverage, and what should happen next. The output should be a short list of findings, not a pile of raw DMV results.
When to run a SQL Server health check
Run a health check when the SQL Server state matters for a decision, not only when something is already broken. The best timing is usually before a risky change, after a meaningful change, or when nobody can explain the current setup cleanly.
Some checks are ad hoc: a one-time review before a migration, patch, upgrade, failover drill, handover, or incident review. Others should be regular: monthly or quarterly checks for backups, jobs, CHECKDB, file growth, waits, access, patch level, and monitoring gaps.
New or handed-over environment
Use a health check before trusting old jobs, backup history, access, monitoring, or HA notes.
Migration or upgrade planning
Check version, compatibility, databases, jobs, linked dependencies, backups, restore path, and rollback risk before the work starts.
Patching and cumulative updates
Run a smaller pre-check before patching and a post-check after patching to catch failed jobs, service changes, errors, or AG issues.
After configuration or infrastructure changes
Review files, storage latency, tempdb, waits, error logs, jobs, and monitoring after moving storage, changing memory, changing service accounts, or modifying HA.
After incidents or repeated slow periods
Use the health check to separate current symptoms from older drift: failed jobs, stale backups, high waits, blocking, errors, or missing alerts.
Regular operations review
Run a lighter monthly or quarterly check so backup, CHECKDB, job, storage, access, patch, and monitoring problems do not wait for the next emergency.
How to start a SQL Server health check
Start by collecting read-only facts before changing anything. A health check needs enough context to separate configuration drift, missing operations, current workload pressure, and design work that belongs in a larger project.
Run these checks from an account with enough metadata access. Some DMVs require `VIEW SERVER STATE`, SQL Server 2022 and newer can require `VIEW SERVER PERFORMANCE STATE` or `VIEW SERVER SECURITY STATE`, error-log access can require `VIEW ANY ERROR LOG`, and `msdb` history depends on permissions. Low-permission output can look cleaner than the server really is.
| Step | What to do |
|---|---|
| 1. Set the reason | Decide whether this is regular review, pre-change, post-change, incident follow-up, migration, patching, or a handover check. |
| 2. Collect the baseline | Capture version, uptime, configuration, databases, files, tempdb, and storage readings before interpreting symptoms. |
| 3. Check recovery and operations | Read backup history, restore history, SQL Agent jobs, CHECKDB, maintenance, cleanup, and monitoring gaps. |
| 4. Read the workload | Review waits, active requests, blocking, file latency, tempdb use, error logs, and suspect pages with timing context. |
| 5. Finish with control points | Check privileged access, ownership, HA/DR state, alert coverage, and what needs a quick fix or planned change. |
SQL Server health check for version, uptime, and configuration
Version, edition, uptime, CPU count, memory, cluster state, and HADR state frame the rest of the review. Configuration checks should focus on resource control and operational behavior, not changing settings because a generic report says so.
How to check SQL Server version and uptime
Reads server identity, version, edition, startup time, CPU, memory, clustered state, and HADR state.
sqlserver_start_time tells you how long DMV counters have been accumulating. cpu_count, scheduler_count, and memory values help interpret pressure checks. IsClustered and IsHadrEnabled show whether cluster or AG checks may apply. Use version and update level when assessing support and patch posture.
How to check SQL Server configuration
Reads common server settings and whether configured values match running values.
value and value_in_use can differ when a setting has not taken effect. max server memory should leave room for the operating system and other services. MAXDOP and cost threshold need workload context before changing. xp_cmdshell and remote admin connections deserve a deliberate access decision.
SQL Server health check for databases and files
Database state, recovery model, compatibility level, Query Store, ownership, size, file layout, and growth settings show what the instance is responsible for. They also decide which recovery, capacity, and security checks matter most.
How to inventory databases
Lists databases with state, recovery model, compatibility level, Query Store flag, owner, size, and file count.
state_desc should be online for normal production databases. recovery_model_desc affects backup and restore expectations. compatibility_level can affect query behavior and upgrade planning. Large or critical databases deserve separate recovery and performance checks.
How to review database files and growth settings
Lists database file paths, sizes, human-readable max size, and growth settings.
physical_name shows where data and log files actually live. max_size_setting converts SQL Server page counts into readable limits. Percent growth can create unpredictable growth events on large files. Data and log files usually need different review questions. File changes need free-space, backup, maintenance, and rollback context.
SQL Server health check for tempdb and storage latency
File latency and tempdb usage show whether storage and temporary workspace are part of the current health picture. Treat these readings as starting points, then compare them with workload timing, job windows, spills, version store use, and recent incidents.
How to check file latency
Reads file-level reads, writes, and average I/O stalls from sys.dm_io_virtual_file_stats.
avg_read_ms and avg_write_ms are starting points for storage review. Log files with high write latency need separate attention. High latency during backup or maintenance windows can mislead a one-time check. A single snapshot needs comparison with workload timing before changing storage.
How to check tempdb space usage
Run in tempdb to review file sizes, free space, user objects, internal objects, version store, and growth settings.
version_store_mb matters when row versioning or long transactions are active. internal_object_mb can rise with sorts, hashes, spools, and spills. growth_setting should be predictable and large enough for production use. Use this with the tempdb guide when pressure is recurring.
SQL Server health check for backups and restore history
A health check should read backup history beside restore history. Successful jobs matter, but they do not answer whether the files, log chain, permissions, keys, and restore timing are good enough.
How to check recent backups
Shows the latest non-copy-only full, latest copy-only full, differential, and log backup per database from msdb backup history.
last_full_backup excludes copy-only full backups so the normal backup chain is clearer. last_copy_only_full_backup can explain ad hoc backups that should not be mistaken for the usual differential base. last_log_backup matters for databases in full or bulk-logged recovery model. Backup history does not replace a real restore test.
How to check restore history
Shows recent restore operations, source backup timing, recovery flag, and backup file path.
restore_date shows whether restore testing has happened recently. destination_database_name separates test restores from production restores. physical_device_name helps identify which backup source was used. No restore history may mean the restore path has not been validated on this instance.
SQL Server health check for SQL Agent jobs and maintenance
SQL Agent history often shows the parts of SQL Server nobody is watching: failed backup jobs, stale CHECKDB, long maintenance windows, cleanup problems, and old job owners. Health checks should include this before recommending changes.
How to check SQL Agent job failures
Reads recent SQL Agent history for failed, long-running, backup, integrity, and maintenance jobs.
run_status identifies failed or retrying job steps. job_owner should be valid and intentional. Long runtime can matter when jobs overlap business hours or maintenance windows. Job failures outside the main application can still affect recovery and stability.
How to check CHECKDB history
Looks for recent CHECKDB or integrity-check job history in SQL Agent.
Job naming varies, so this is a starting point rather than a complete guarantee. run_status and message show whether the integrity job completed or reported errors. No rows may mean the job has another name, history was cleaned, or integrity checks are missing. Use job output files or monitoring data when SQL Agent history is incomplete.
SQL Server health check for waits, blocking, and active requests
Wait stats and active requests show where SQL Server has been spending time and what is happening now. They should point to the next check, not become a verdict by themselves.
How to check top wait stats
Lists top cumulative waits while excluding common background waits.
wait_seconds helps identify the largest cumulative waits since startup. signal_wait_time_ms can point toward scheduler pressure. High wait totals need workload timing and baseline context. Do not reset wait stats only to make a report look cleaner.
How to check active requests and blocking
Shows current requests with blocking session, wait details, resource use, database, and statement text.
blocking_session_id identifies live blocking chains. wait_type and wait_resource show what the request is waiting on right now. logical_reads, reads, writes, and cpu_time help prioritize active work. Use this with the blocking guide when blocking is active.
SQL Server health check for error logs and corruption signals
Error logs and suspect pages catch issues that do not always show up in a simple dashboard: I/O errors, login failures, failover messages, corruption warnings, failed recovery, and repeated service-level problems.
How to read recent SQL Server error log entries
Reads the current SQL Server error log for recent entries. Filter the output for errors, I/O, corruption, login failures, AG, failover, and startup messages.
sp_readerrorlog reads SQL Server error logs; access may be restricted. Review severity and repetition, not only the newest line. Look for I/O, checksum, stack dump, login failure, recovery, AG, and failover patterns. A noisy log needs filtering, but do not ignore repeated infrastructure messages.
How to check suspect pages
Reads msdb suspect page history for recorded page-level problems.
Rows in suspect_pages deserve immediate backup, CHECKDB, storage, and error-log review. event_type_desc gives the broad reason SQL Server recorded the page. Old rows may have been repaired or restored, but still explain past incidents. No rows does not replace CHECKDB history or storage monitoring.
SQL Server health check for access, security, and ownership
A health check is not a full hardening audit, but it should catch broad privilege, disabled login, sysadmin, service account, job owner, and ownership problems that affect production support.
How to review privileged access
Lists SQL, Windows, group, and external logins with disabled state, default database, and server role membership without using SQL Server 2017-only aggregation.
sysadmin membership should be short and explainable. External login and group principals can matter in newer or cloud-connected environments. Disabled logins can still indicate stale access management. Default databases should exist and be reachable. Service accounts and automation logins need ownership and purpose.
SQL Server health check for HA, DR, and monitoring coverage
HA and DR health depends on tested behavior: replica health, failover readiness, restore sequence, monitoring alerts, SQL Agent behavior, application validation, and ownership during an incident.
How to check Availability Group health
Shows AG primary, replica mode, failover mode, connection state, recovery health, and synchronization health.
primary_replica and role_desc show the current AG layout. availability_mode_desc and failover_mode_desc show what failover behavior is configured. connected_state_desc and synchronization health need review before relying on HA. No AG rows means this check does not apply to that instance.
| Coverage check | What to confirm |
|---|---|
| Monitoring | Backup failures, job failures, disk growth, error log severity, blocking, AG health, and service state generate alerts that someone reads. |
| Recovery | Restore sequence, file access, encryption keys, and timing have been tested recently enough for the business need. |
| HA | Failover mode, synchronization health, listener behavior, jobs, backups, and application validation are understood. |
| Ownership | The people responsible for alerts, access changes, maintenance, and incident decisions are known. |
What not to change after a SQL Server health check
Do not treat a health check as a generic score.
Do not change server settings from one snapshot alone.
Do not reset wait stats casually in production.
Do not assume successful backup jobs mean recovery works.
Do not drop indexes or change maintenance from one report.
Do not ignore SQL Agent failures that look unrelated to the main application.
Do not call HA or DR healthy without failover or restore validation.
Do not turn a health check into a large refactor before ranking the risks.
When to request a SQL Server health audit
A SQL Server health audit makes sense when the server is changing, support ownership is unclear, backups have not been restored recently, incidents repeat, monitoring does not explain enough, or the next upgrade or migration needs a cleaner findings list.
Send version output, database inventory, configuration output, file and tempdb checks, backup history, restore history, job history, CHECKDB history, wait stats, blocking examples, error-log details, login list, AG health, monitoring screenshots, and the main production concern.
Next step
If the review needs findings, priorities, and a follow-up path, use the SQL Server health audit page or request the audit above.
Next useful reads: the SQL Server monitoring guide for ongoing checks, the SQL Server backup guide for recovery inputs, the SQL Server recovery guide for restore validation, the SQL Server hardening guide for security posture, and the SQL Server failover guide for HA checks.
