sql server hub / health check guide

SQL Server health check guide

A SQL Server health check should show what is healthy, what is weak, what is missing, and what should be fixed first.

Use this guide to check version, configuration, files, tempdb, backups, restores, jobs, waits, blocking, error logs, access, HA, and monitoring with concrete SQL Server output.

Guide

Guide~16 min readUpdated 4 Jul 2026

LinkedIn X Email

01What a SQL Server health check should cover
02What is included in a SQL Server health check
03SQL Server instance review
04How do I know if my SQL Server is configured correctly?
05When to run a SQL Server health check
06How to start a SQL Server health check
07SQL Server health check for version, uptime, and configuration
08SQL Server health check for databases and files
09SQL Server health check for tempdb and storage latency
10SQL Server health check for backups and restore history
11SQL Server health check for SQL Agent jobs and maintenance
12SQL Server health check for waits, blocking, and active requests
13SQL Server health check for error logs and corruption signals
14SQL Server health check for access, security, and ownership
15SQL Server health check for HA, DR, and monitoring coverage
16What not to change after a SQL Server health check
17When to request a SQL Server health audit

What a SQL Server health check should cover

A SQL Server health check should move from context to risk. First understand the instance, then check recovery and routine operations, then read workload symptoms, then decide what needs action.

A good check answers the practical question: is this SQL Server configured well enough for the work it is doing? The answer comes from backup history, restore history, job output, wait stats, file and tempdb data, error logs, access checks, HA state, and monitoring data.

The useful result is a short findings list backed by current SQL Server output, with enough context to decide what is healthy, what is weak, what is missing, and what needs a planned change.

If you need a SQL Server health check service instead of a do-it-yourself review, use the SQL Server health audit. This guide explains what the check should include before you ask for help or run the checklist yourself.

Baseline state

Start with the facts that make every later reading useful: SQL Server version, uptime, edition, configuration, databases, file layout, growth settings, and tempdb shape.

Recovery and routine operations

Check whether backups, restore history, SQL Agent jobs, CHECKDB, maintenance, cleanup, and monitoring are actually running in a way production can rely on.

Current workload and failure clues

Use waits, active requests, blocking, file latency, tempdb use, error logs, and suspect pages to see where the server is currently under pressure or quietly failing.

Access, HA, and follow-up

Finish by checking privileged access, ownership, HA/DR state, alert coverage, and what should happen next. The output should be a short list of findings, not a pile of raw DMV results.

What is included in a SQL Server health check

A SQL Server health check should include the parts that decide whether the instance can be operated safely: recovery, jobs, configuration, workload symptoms, access, HA or DR, and monitoring. It should also show which findings need action and which ones are only background notes.

The important part is the connection between checks. A failed backup job affects recovery. Weak monitoring affects incident response. Bad file growth can show up as slow queries. Old sysadmin access can affect security and change control.

The table below is the short version of a SQL Server audit checklist. Use the SQL Server health check checklist when you want the checklist form, and use the deeper sections below when you need SQL output.

Area	What to check
Instance identity	Version, edition, build, uptime, CPU, memory, clustered state, HADR state, and support context.
Configuration	Max server memory, MAXDOP, cost threshold, backup compression, ad hoc workload settings, and risky enabled features.
Databases and files	Database state, recovery model, compatibility level, owners, data and log file layout, growth settings, and size.
Backups and restore history	Full, differential, and log backup coverage, restore history, restore-test age, retention, and recovery gaps.
SQL Agent and maintenance	Failed jobs, disabled jobs, long-running jobs, CHECKDB history, cleanup, index/statistics jobs, and notification setup.
Workload symptoms	Wait stats, blocking, active requests, file latency, tempdb pressure, Query Store clues, deadlocks, and repeated slow periods.
Error logs and corruption clues	SQL Server error log entries, suspect pages, checksum errors, I/O errors, severity errors, and restart messages.
Security and ownership	Sysadmin membership, disabled or old logins, service accounts, linked servers, Agent proxies, and who owns each follow-up.
HA, DR, and monitoring	Availability Group or cluster state, failover assumptions, alert routing, monitoring coverage, and response paths.

SQL Server instance review

A SQL Server instance review checks the server as it runs today. It should connect configuration, backups, jobs, workload symptoms, security, HA, and monitoring instead of reading each item as a separate checklist box.

This is also the safest way to answer whether SQL Server is configured correctly. A setting can be technically valid and still wrong for the workload, the recovery target, or the people who have to support it.

A SQL Server assessment service should do the same thing with a second pair of eyes: read the current output, separate real risk from noise, and turn the findings into a short list of next steps.

Review question	What to check
Is the instance identifiable?	Record server name, machine name, SQL Server version, edition, CU level, uptime, clustered state, and HADR state.
Is it configured for the workload?	Check memory, parallelism, backup compression, tempdb, file growth, storage latency, and obvious risky settings.
Can it recover?	Check backup history, log backup patterns, restore history, restore-test age, DBCC CHECKDB, and where backup alerts go.
Is scheduled work healthy?	Check SQL Agent jobs, owners, schedules, failures, long runs, disabled jobs, maintenance, cleanup, and notification settings.
Does monitoring explain incidents?	Check alerts, error logs, waits, blocking, deadlocks, Query Store, file latency, and whether history exists for real incident windows.
Is access controlled?	Check sysadmin membership, service accounts, login age, disabled logins, linked servers, Agent proxies, and ownership of privileged work.
Are HA and DR claims usable?	Check AG or cluster health, failover behavior, jobs after failover, listener behavior, restore path, and runbook freshness.

How do I know if my SQL Server is configured correctly?

You know SQL Server is configured correctly when the settings match the workload, the recovery target, and the way the server is supported. A clean configuration is not just a set of recommended values.

Start with memory, MAXDOP, cost threshold, tempdb, file growth, backup coverage, SQL Agent jobs, access, monitoring, and HA or DR state. Then compare those checks with the business use of the instance: busy hours, reporting windows, recovery targets, vendor support rules, and who responds when alerts fire.

A server can be online and still be poorly configured. The usual warning signs are missing restore tests, SQL Agent jobs nobody owns, tempdb pressure, repeated blocking, file autogrowth during business hours, noisy alerts, or admin access that nobody can explain.

Area	Configured correctly means
Memory	SQL Server has enough memory for the workload, and max server memory leaves room for Windows, agents, backup tools, and monitoring.
Parallelism	MAXDOP and cost threshold fit the CPU layout and query patterns. The goal is stable query behavior, not copying a setting from another server.
tempdb	File count, file size, growth settings, free space, and version-store behavior fit the workload. Recurring tempdb pressure usually needs deeper review.
Files and storage	Data and log files have sensible growth settings, enough space, predictable placement, and latency that matches the workload timing.
Backups and restores	Backups run on the expected schedule, log backups protect the right databases, restore history or restore tests exist, and alerts reach someone.
SQL Agent	Critical jobs are enabled, owned by the right account, finish inside their window, keep enough history, and notify the right people on failure.
Access	Sysadmin access is explainable, service accounts have a purpose, linked servers and proxies are known, and old logins do not quietly keep admin rights.
Monitoring	Monitoring catches backup failures, job failures, disk growth, severe errors, blocking, deadlocks, and HA problems with enough history to investigate.
HA and DR	Availability Groups, clustering, restore plans, failover steps, jobs, logins, and application checks have been tested recently enough for the business need.

When to run a SQL Server health check

Run a health check when the SQL Server state matters for a decision, not only when something is already broken. The best timing is usually before a risky change, after a meaningful change, or when nobody can explain the current setup cleanly.

Some checks are ad hoc: a one-time review before a migration, patch, upgrade, failover drill, handover, or incident review. Others should be regular: monthly or quarterly checks for backups, jobs, CHECKDB, file growth, waits, access, patch level, and monitoring gaps.

New or handed-over environment

Use a health check before trusting old jobs, backup history, access, monitoring, or HA notes.

Migration or upgrade planning

Check version, compatibility, databases, jobs, linked dependencies, backups, restore path, and rollback risk before the work starts.

Patching and cumulative updates

Run a smaller pre-check before patching and a post-check after patching to catch failed jobs, service changes, errors, or AG issues.

After configuration or infrastructure changes

Review files, storage latency, tempdb, waits, error logs, jobs, and monitoring after moving storage, changing memory, changing service accounts, or modifying HA.

After incidents or repeated slow periods

Use the health check to separate current symptoms from older drift: failed jobs, stale backups, high waits, blocking, errors, or missing alerts.

Regular operations review

Run a lighter monthly or quarterly check so backup, CHECKDB, job, storage, access, patch, and monitoring problems do not wait for the next emergency.

How to start a SQL Server health check

Start by collecting read-only facts before changing anything. A health check needs enough context to separate configuration drift, missing operations, current workload pressure, and design work that belongs in a larger project.

Run these checks from an account with enough metadata access. Some DMVs require `VIEW SERVER STATE`, SQL Server 2022 and newer can require `VIEW SERVER PERFORMANCE STATE` or `VIEW SERVER SECURITY STATE`, error-log access can require `VIEW ANY ERROR LOG`, and `msdb` history depends on permissions. Low-permission output can look cleaner than the server really is.

Step	What to do
1. Set the reason	Decide whether this is regular review, pre-change, post-change, incident follow-up, migration, patching, or a handover check.
2. Collect the baseline	Capture version, uptime, configuration, databases, files, tempdb, and storage readings before interpreting symptoms.
3. Check recovery and operations	Read backup history, restore history, SQL Agent jobs, CHECKDB, maintenance, cleanup, and monitoring gaps.
4. Read the workload	Review waits, active requests, blocking, file latency, tempdb use, error logs, and suspect pages with timing context.
5. Finish with control points	Check privileged access, ownership, HA/DR state, alert coverage, and what needs a quick fix or planned change.

SQL Server health check for version, uptime, and configuration

Version, edition, uptime, CPU count, memory, cluster state, and HADR state frame the rest of the review. Configuration checks should focus on resource control and operational behavior, not changing settings because a generic report says so.

How to check SQL Server version and uptime

Reads server identity, version, edition, startup time, CPU, memory, clustered state, and HADR state.

SELECT
    SERVERPROPERTY('ServerName') AS server_name,
    SERVERPROPERTY('MachineName') AS machine_name,
    SERVERPROPERTY('Edition') AS edition,
    SERVERPROPERTY('ProductVersion') AS product_version,
    SERVERPROPERTY('ProductLevel') AS product_level,
    SERVERPROPERTY('ProductUpdateLevel') AS product_update_level,
    SERVERPROPERTY('IsClustered') AS is_clustered,
    SERVERPROPERTY('IsHadrEnabled') AS is_hadr_enabled,
    SERVERPROPERTY('HadrManagerStatus') AS hadr_manager_status,
    si.sqlserver_start_time,
    si.cpu_count,
    si.scheduler_count,
    si.physical_memory_kb / 1024 AS physical_memory_mb,
    si.committed_kb / 1024 AS sql_committed_mb,
    si.committed_target_kb / 1024 AS sql_target_mb
FROM sys.dm_os_sys_info AS si;

sqlserver_start_time tells you how long DMV counters have been accumulating. cpu_count, scheduler_count, and memory values help interpret pressure checks. IsClustered and IsHadrEnabled show whether cluster or AG checks may apply. Use version and update level when assessing support and patch posture.

How to check SQL Server configuration

Reads common server settings and whether configured values match running values.

SELECT
    name,
    value,
    value_in_use,
    is_dynamic,
    is_advanced,
    description
FROM sys.configurations
WHERE name IN (
    N'max server memory (MB)',
    N'min server memory (MB)',
    N'max degree of parallelism',
    N'cost threshold for parallelism',
    N'backup compression default',
    N'optimize for ad hoc workloads',
    N'remote admin connections',
    N'xp_cmdshell'
)
ORDER BY name;

value and value_in_use can differ when a setting has not taken effect. max server memory should leave room for the operating system and other services. MAXDOP and cost threshold need workload context before changing. xp_cmdshell and remote admin connections deserve a deliberate access decision.

SQL Server health check for databases and files

Database state, recovery model, compatibility level, Query Store, ownership, size, file layout, and growth settings show what the instance is responsible for. They also decide which recovery, capacity, and security checks matter most.

How to inventory databases

Lists databases with state, recovery model, compatibility level, Query Store flag, owner, size, and file count.

SELECT
    d.name AS database_name,
    d.state_desc,
    d.recovery_model_desc,
    d.compatibility_level,
    d.is_read_only,
    d.is_query_store_on,
    SUSER_SNAME(d.owner_sid) AS database_owner,
    CONVERT(decimal(18,2), SUM(mf.size) * 8.0 / 1024.0) AS size_mb,
    COUNT(mf.file_id) AS file_count
FROM sys.databases AS d
JOIN sys.master_files AS mf
    ON mf.database_id = d.database_id
WHERE d.name <> N'tempdb'
GROUP BY
    d.name,
    d.state_desc,
    d.recovery_model_desc,
    d.compatibility_level,
    d.is_read_only,
    d.is_query_store_on,
    d.owner_sid
ORDER BY size_mb DESC;

state_desc should be online for normal production databases. recovery_model_desc affects backup and restore expectations. compatibility_level can affect query behavior and upgrade planning. Large or critical databases deserve separate recovery and performance checks.

How to review database files and growth settings

Lists database file paths, sizes, human-readable max size, and growth settings.

SELECT
    DB_NAME(mf.database_id) AS database_name,
    mf.type_desc,
    mf.name AS logical_file_name,
    mf.physical_name,
    CONVERT(decimal(18,2), mf.size * 8.0 / 1024.0) AS size_mb,
    CASE
        WHEN mf.is_percent_growth = 1 THEN CONCAT(mf.growth, N'%')
        ELSE CONCAT(CONVERT(decimal(18,2), mf.growth * 8.0 / 1024.0), N' MB')
    END AS growth_setting,
    CASE
        WHEN mf.max_size = -1 THEN N'Unlimited'
        WHEN mf.max_size = 0 THEN N'No growth'
        WHEN mf.type_desc = N'LOG' AND mf.max_size = 268435456 THEN N'2 TB'
        ELSE CONCAT(CONVERT(decimal(18,2), mf.max_size * 8.0 / 1024.0), N' MB')
    END AS max_size_setting,
    mf.is_percent_growth
FROM sys.master_files AS mf
WHERE DB_NAME(mf.database_id) <> N'tempdb'
ORDER BY database_name, mf.type_desc, mf.file_id;

physical_name shows where data and log files actually live. max_size_setting converts SQL Server page counts into readable limits. Percent growth can create unpredictable growth events on large files. Data and log files usually need different review questions. File changes need free-space, backup, maintenance, and rollback context.

SQL Server health check for tempdb and storage latency

File latency and tempdb usage show whether storage and temporary workspace are part of the current health picture. Treat these readings as starting points, then compare them with workload timing, job windows, spills, version store use, and recent incidents.

How to check file latency

Reads file-level reads, writes, and average I/O stalls from sys.dm_io_virtual_file_stats.

SELECT
    DB_NAME(mf.database_id) AS database_name,
    mf.type_desc,
    mf.name AS logical_file_name,
    mf.physical_name,
    CONVERT(decimal(18,2), mf.size * 8.0 / 1024.0) AS size_mb,
    vfs.num_of_reads,
    vfs.num_of_writes,
    CONVERT(decimal(18,2), vfs.io_stall_read_ms * 1.0 / NULLIF(vfs.num_of_reads, 0)) AS avg_read_ms,
    CONVERT(decimal(18,2), vfs.io_stall_write_ms * 1.0 / NULLIF(vfs.num_of_writes, 0)) AS avg_write_ms,
    CONVERT(decimal(18,2), vfs.io_stall * 1.0 / NULLIF(vfs.num_of_reads + vfs.num_of_writes, 0)) AS avg_io_ms
FROM sys.master_files AS mf
CROSS APPLY sys.dm_io_virtual_file_stats(mf.database_id, mf.file_id) AS vfs
WHERE DB_NAME(mf.database_id) <> N'tempdb'
ORDER BY avg_io_ms DESC, database_name, mf.file_id;

avg_read_ms and avg_write_ms are starting points for storage review. Log files with high write latency need separate attention. High latency during backup or maintenance windows can mislead a one-time check. A single snapshot needs comparison with workload timing before changing storage.

How to check tempdb space usage

Run in tempdb to review file sizes, free space, user objects, internal objects, version store, and growth settings.

USE tempdb;

SELECT
    df.file_id,
    df.name AS logical_file_name,
    df.type_desc,
    df.physical_name,
    CONVERT(decimal(18,2), df.size * 8.0 / 1024.0) AS file_size_mb,
    CONVERT(decimal(18,2), fsu.unallocated_extent_page_count * 8.0 / 1024.0) AS free_space_mb,
    CONVERT(decimal(18,2), fsu.user_object_reserved_page_count * 8.0 / 1024.0) AS user_object_mb,
    CONVERT(decimal(18,2), fsu.internal_object_reserved_page_count * 8.0 / 1024.0) AS internal_object_mb,
    CONVERT(decimal(18,2), fsu.version_store_reserved_page_count * 8.0 / 1024.0) AS version_store_mb,
    CASE
        WHEN df.is_percent_growth = 1 THEN CONCAT(df.growth, N'%')
        ELSE CONCAT(CONVERT(decimal(18,2), df.growth * 8.0 / 1024.0), N' MB')
    END AS growth_setting
FROM tempdb.sys.database_files AS df
LEFT JOIN sys.dm_db_file_space_usage AS fsu
    ON fsu.file_id = df.file_id
ORDER BY df.file_id;

version_store_mb matters when row versioning or long transactions are active. internal_object_mb can rise with sorts, hashes, spools, and spills. growth_setting should be predictable and large enough for production use. Use this with the tempdb guide when pressure is recurring.

SQL Server health check for backups and restore history

A health check should read backup history beside restore history. Successful jobs matter, but they do not answer whether the files, log chain, permissions, keys, and restore timing are good enough.

How to check recent backups

Shows the latest non-copy-only full, latest copy-only full, differential, and log backup per database from msdb backup history.

WITH LastBackups AS (
    SELECT
        bs.database_name,
        bs.type,
        MAX(bs.backup_finish_date) AS last_backup_finish_date,
        MAX(CASE WHEN bs.is_copy_only = 0 THEN bs.backup_finish_date END) AS last_non_copy_only_finish_date,
        MAX(CASE WHEN bs.is_copy_only = 1 THEN bs.backup_finish_date END) AS last_copy_only_finish_date
    FROM msdb.dbo.backupset AS bs
    WHERE bs.type IN ('D', 'I', 'L')
    GROUP BY bs.database_name, bs.type
)
SELECT
    d.name AS database_name,
    d.recovery_model_desc,
    d.state_desc,
    MAX(CASE WHEN lb.type = 'D' THEN lb.last_non_copy_only_finish_date END) AS last_full_backup,
    MAX(CASE WHEN lb.type = 'D' THEN lb.last_copy_only_finish_date END) AS last_copy_only_full_backup,
    MAX(CASE WHEN lb.type = 'I' THEN lb.last_backup_finish_date END) AS last_differential_backup,
    MAX(CASE WHEN lb.type = 'L' THEN lb.last_backup_finish_date END) AS last_log_backup
FROM sys.databases AS d
LEFT JOIN LastBackups AS lb
    ON lb.database_name = d.name
WHERE d.name <> N'tempdb'
GROUP BY d.name, d.recovery_model_desc, d.state_desc
ORDER BY d.name;

last_full_backup excludes copy-only full backups so the normal backup chain is clearer. last_copy_only_full_backup can explain ad hoc backups that should not be mistaken for the usual differential base. last_log_backup matters for databases in full or bulk-logged recovery model. Backup history does not replace a real restore test.

How to check restore history

Shows recent restore operations, source backup timing, recovery flag, and backup file path.

SELECT TOP (100)
    rh.destination_database_name,
    rh.restore_date,
    rh.restore_type,
    rh.replace,
    rh.recovery,
    bs.database_name AS source_database_name,
    bs.backup_finish_date,
    bmf.physical_device_name
FROM msdb.dbo.restorehistory AS rh
LEFT JOIN msdb.dbo.backupset AS bs
    ON bs.backup_set_id = rh.backup_set_id
LEFT JOIN msdb.dbo.backupmediafamily AS bmf
    ON bmf.media_set_id = bs.media_set_id
ORDER BY rh.restore_date DESC;

restore_date shows whether restore testing has happened recently. destination_database_name separates test restores from production restores. physical_device_name helps identify which backup source was used. No restore history may mean the restore path has not been validated on this instance.

SQL Server health check for SQL Agent jobs and maintenance

SQL Agent history often shows the parts of SQL Server nobody is watching: failed backup jobs, stale CHECKDB, long maintenance windows, cleanup problems, and old job owners. Health checks should include this before recommending changes.

How to check SQL Agent job failures

Reads recent SQL Agent history for failed, long-running, backup, integrity, and maintenance jobs.

SELECT TOP (120)
    j.name AS job_name,
    j.enabled,
    SUSER_SNAME(j.owner_sid) AS job_owner,
    h.step_id,
    h.step_name,
    h.run_status,
    DATETIMEFROMPARTS(
        h.run_date / 10000,
        (h.run_date % 10000) / 100,
        h.run_date % 100,
        h.run_time / 10000,
        (h.run_time % 10000) / 100,
        h.run_time % 100,
        0
    ) AS run_start_time,
    ((h.run_duration / 10000) * 3600)
      + (((h.run_duration % 10000) / 100) * 60)
      + (h.run_duration % 100) AS run_duration_seconds,
    h.message
FROM msdb.dbo.sysjobs AS j
LEFT JOIN msdb.dbo.sysjobhistory AS h
    ON h.job_id = j.job_id
WHERE h.run_date >= CONVERT(int, CONVERT(char(8), DATEADD(day, -14, SYSDATETIME()), 112))
  AND (
      h.run_status <> 1
      OR h.run_duration >= 10000
      OR j.name LIKE N'%backup%'
      OR j.name LIKE N'%checkdb%'
      OR j.name LIKE N'%maintenance%'
  )
ORDER BY run_start_time DESC, j.name, h.step_id;

run_status identifies failed or retrying job steps. job_owner should be valid and intentional. Long runtime can matter when jobs overlap business hours or maintenance windows. Job failures outside the main application can still affect recovery and stability.

How to check CHECKDB history

Looks for recent CHECKDB or integrity-check job history in SQL Agent.

SELECT TOP (100)
    j.name AS job_name,
    j.enabled,
    h.step_id,
    h.step_name,
    h.run_status,
    DATETIMEFROMPARTS(
        h.run_date / 10000,
        (h.run_date % 10000) / 100,
        h.run_date % 100,
        h.run_time / 10000,
        (h.run_time % 10000) / 100,
        h.run_time % 100,
        0
    ) AS run_start_time,
    ((h.run_duration / 10000) * 3600)
      + (((h.run_duration % 10000) / 100) * 60)
      + (h.run_duration % 100) AS run_duration_seconds,
    h.message
FROM msdb.dbo.sysjobs AS j
JOIN msdb.dbo.sysjobhistory AS h
    ON h.job_id = j.job_id
WHERE h.run_date >= CONVERT(int, CONVERT(char(8), DATEADD(day, -30, SYSDATETIME()), 112))
  AND (
      j.name LIKE N'%checkdb%'
      OR j.name LIKE N'%integrity%'
      OR h.step_name LIKE N'%checkdb%'
      OR h.message LIKE N'%DBCC CHECKDB%'
  )
ORDER BY run_start_time DESC, j.name, h.step_id;

Job naming varies, so this is a starting point rather than a complete guarantee. run_status and message show whether the integrity job completed or reported errors. No rows may mean the job has another name, history was cleaned, or integrity checks are missing. Use job output files or monitoring data when SQL Agent history is incomplete.

SQL Server health check for waits, blocking, and active requests

Wait stats and active requests show where SQL Server has been spending time and what is happening now. They should point to the next check, not become a verdict by themselves.

How to check top wait stats

Lists top cumulative waits while excluding common background waits.

SELECT TOP (25)
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    signal_wait_time_ms,
    wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms,
    CONVERT(decimal(18,2), wait_time_ms / 1000.0) AS wait_seconds
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (
    N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR', N'BROKER_TASK_STOP',
    N'BROKER_TO_FLUSH', N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',
    N'CHKPT', N'CLR_AUTO_EVENT', N'CLR_MANUAL_EVENT', N'LAZYWRITER_SLEEP',
    N'LOGMGR_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE',
    N'SERVER_IDLE_CHECK', N'SLEEP_TASK', N'SLEEP_SYSTEMTASK',
    N'SQLTRACE_BUFFER_FLUSH', N'WAITFOR', N'XE_DISPATCHER_WAIT',
    N'XE_TIMER_EVENT'
)
ORDER BY wait_time_ms DESC;

wait_seconds helps identify the largest cumulative waits since startup. signal_wait_time_ms can point toward scheduler pressure. High wait totals need workload timing and baseline context. Do not reset wait stats only to make a report look cleaner.

How to check active requests and blocking

Shows current requests with blocking session, wait details, resource use, database, and statement text.

SELECT
    r.session_id,
    s.login_name,
    s.host_name,
    s.program_name,
    r.status,
    r.command,
    r.blocking_session_id,
    r.wait_type,
    r.wait_time,
    r.wait_resource,
    r.cpu_time,
    r.logical_reads,
    r.reads,
    r.writes,
    DB_NAME(r.database_id) AS database_name,
    SUBSTRING(
        st.text,
        (r.statement_start_offset / 2) + 1,
        ((CASE r.statement_end_offset
            WHEN -1 THEN DATALENGTH(st.text)
            ELSE r.statement_end_offset
          END - r.statement_start_offset) / 2) + 1
    ) AS current_statement
FROM sys.dm_exec_requests AS r
JOIN sys.dm_exec_sessions AS s
    ON s.session_id = r.session_id
OUTER APPLY sys.dm_exec_sql_text(r.sql_handle) AS st
WHERE r.session_id <> @@SPID
ORDER BY r.blocking_session_id DESC, r.cpu_time DESC, r.logical_reads DESC;

blocking_session_id identifies live blocking chains. wait_type and wait_resource show what the request is waiting on right now. logical_reads, reads, writes, and cpu_time help prioritize active work. Use this with the blocking guide when blocking is active.

SQL Server health check for error logs and corruption signals

Error logs and suspect pages catch issues that do not always show up in a simple dashboard: I/O errors, login failures, failover messages, corruption warnings, failed recovery, and repeated service-level problems.

How to read recent SQL Server error log entries

Reads the current SQL Server error log for recent entries. Filter the output for errors, I/O, corruption, login failures, AG, failover, and startup messages.

EXEC sp_readerrorlog 0, 1;
EXEC sp_readerrorlog 0, 1, N'error';
EXEC sp_readerrorlog 0, 1, N'fail';
EXEC sp_readerrorlog 0, 1, N'I/O';
EXEC sp_readerrorlog 0, 1, N'checksum';

sp_readerrorlog reads SQL Server error logs; access may be restricted. Review severity and repetition, not only the newest line. Look for I/O, checksum, stack dump, login failure, recovery, AG, and failover patterns. A noisy log needs filtering, but do not ignore repeated infrastructure messages.

How to check suspect pages

Reads msdb suspect page history for recorded page-level problems.

SELECT
    DB_NAME(database_id) AS database_name,
    file_id,
    page_id,
    event_type,
    CASE event_type
        WHEN 1 THEN '823 error caused by an operating system CRC error or torn page'
        WHEN 2 THEN 'Bad checksum'
        WHEN 3 THEN 'Torn page'
        WHEN 4 THEN 'Restored'
        WHEN 5 THEN 'Repaired'
        WHEN 7 THEN 'Deallocated by DBCC'
    END AS event_type_desc,
    error_count,
    last_update_date
FROM msdb.dbo.suspect_pages
ORDER BY last_update_date DESC, database_name, file_id, page_id;

Rows in suspect_pages deserve immediate backup, CHECKDB, storage, and error-log review. event_type_desc gives the broad reason SQL Server recorded the page. Old rows may have been repaired or restored, but still explain past incidents. No rows does not replace CHECKDB history or storage monitoring.

SQL Server health check for access, security, and ownership

A health check is not a full hardening audit, but it should catch broad privilege, disabled login, sysadmin, service account, job owner, and ownership problems that affect production support.

How to review privileged access

Lists SQL, Windows, group, and external logins with disabled state, default database, and server role membership without using SQL Server 2017-only aggregation.

SELECT
    sp.name AS login_name,
    sp.type_desc,
    sp.is_disabled,
    sp.default_database_name,
    sp.create_date,
    sp.modify_date,
    role_principal.name AS server_role
FROM sys.server_principals AS sp
LEFT JOIN sys.server_role_members AS srm
    ON srm.member_principal_id = sp.principal_id
LEFT JOIN sys.server_principals AS role_principal
    ON role_principal.principal_id = srm.role_principal_id
WHERE sp.type IN ('S', 'U', 'G', 'E', 'X')
  AND sp.name NOT LIKE N'##MS_%'
ORDER BY
    CASE WHEN role_principal.name = N'sysadmin' THEN 0 ELSE 1 END,
    sp.is_disabled,
    sp.name,
    role_principal.name;

sysadmin membership should be short and explainable. External login and group principals can matter in newer or cloud-connected environments. Disabled logins can still indicate stale access management. Default databases should exist and be reachable. Service accounts and automation logins need ownership and purpose.

SQL Server health check for HA, DR, and monitoring coverage

HA and DR health depends on tested behavior: replica health, failover readiness, restore sequence, monitoring alerts, SQL Agent behavior, application validation, and ownership during an incident.

How to check Availability Group health

Shows AG primary, replica mode, failover mode, connection state, recovery health, and synchronization health.

SELECT
    ag.name AS availability_group_name,
    ags.primary_replica,
    ags.primary_recovery_health_desc,
    ags.secondary_recovery_health_desc,
    ags.synchronization_health_desc,
    ar.replica_server_name,
    ar.availability_mode_desc,
    ar.failover_mode_desc,
    ars.role_desc,
    ars.connected_state_desc,
    ars.recovery_health_desc,
    ars.synchronization_health_desc AS replica_synchronization_health_desc
FROM sys.availability_groups AS ag
LEFT JOIN sys.dm_hadr_availability_group_states AS ags
    ON ags.group_id = ag.group_id
LEFT JOIN sys.availability_replicas AS ar
    ON ar.group_id = ag.group_id
LEFT JOIN sys.dm_hadr_availability_replica_states AS ars
    ON ars.replica_id = ar.replica_id
ORDER BY ag.name, ar.replica_server_name;

primary_replica and role_desc show the current AG layout. availability_mode_desc and failover_mode_desc show what failover behavior is configured. connected_state_desc and synchronization health need review before relying on HA. No AG rows means this check does not apply to that instance.

Coverage check	What to confirm
Monitoring	Backup failures, job failures, disk growth, error log severity, blocking, AG health, and service state generate alerts that someone reads.
Recovery	Restore sequence, file access, encryption keys, and timing have been tested recently enough for the business need.
HA	Failover mode, synchronization health, listener behavior, jobs, backups, and application validation are understood.
Ownership	The people responsible for alerts, access changes, maintenance, and incident decisions are known.

What not to change after a SQL Server health check

Do not treat a health check as a generic score.

Do not change server settings from one snapshot alone.

Do not reset wait stats casually in production.

Do not assume successful backup jobs mean recovery works.

Do not drop indexes or change maintenance from one report.

Do not ignore SQL Agent failures that look unrelated to the main application.

Do not call HA or DR healthy without failover or restore validation.

Do not turn a health check into a large refactor before ranking the risks.

When to request a SQL Server health audit

A SQL Server health audit makes sense when the server is changing, support ownership is unclear, backups have not been restored recently, incidents repeat, or the next upgrade or migration needs a clean findings list.

Use it as the service version of this guide: a SQL Server health check service for companies that want the instance reviewed and explained, not just a checklist filled in.

Start with the SQL Server version, instance count, main concern, and any backup, job, monitoring, or error-log details you already have. That is enough for a first review.

Request health audit

Next step

If the review needs findings, priorities, and a follow-up path, use the SQL Server health audit page or request the audit above.

Use the SQL Server health check checklist when you want the checklist version of this guide.

Next useful reads: the SQL Server monitoring guide for ongoing checks, SQL Server Configuration Manager for services and TCP/IP settings, the SQL Server backup guide for recovery inputs, the SQL Server recovery guide for restore validation, the SQL Server hardening guide for security posture, and the SQL Server failover guide for HA checks.