| 01 | Production instance and database scope | Scope | You cannot review SQL Server health until the production instances, databases, and dependencies are clear. | Instance names, environments, critical databases, listeners, applications, reporting loads, vendor systems, and DR targets. |
| 02 | Business-critical databases | Scope | Not every database deserves the same attention. Critical databases need the tightest recovery, monitoring, and change checks. | Revenue, operations, customer-facing, identity, integration, reporting, and batch databases that would hurt the business if unavailable. |
| 03 | SQL Server ownership and escalation path | Ownership | Backups, alerts, jobs, and incidents fail slowly when nobody is clearly responsible for acting on them. | Named SQL owner, change owner, backup responder, incident contact, vendor boundary, and after-hours path. |
| 04 | Recent full backup history | Recovery | Full backups are the base of most restore plans. Missing or stale full backups make the rest of recovery harder. | Last full backup per important database, failed runs, copy-only surprises, duration changes, and skipped databases. |
| 05 | Differential and log backup coverage | Recovery | Differential and log backups usually decide how much data the company loses after a restore. | Recovery model, last differential backup, last log backup, broken log-chain patterns, and databases in FULL without log backups. |
| 06 | Restore test history | Recovery | Backup success does not mean restore success. Restore testing is where many backup plans stop looking good. | Last restore test, target server, restored database name, duration, CHECKDB after restore, and who ran it. |
| 07 | DBCC CHECKDB or integrity-check coverage | Integrity | Corruption is rare until it is not. Integrity checks tell you whether backups may contain damaged data. | Recent DBCC CHECKDB history, failures, skipped large databases, physical-only shortcuts, and where results are logged. |
| 08 | Backup destination, retention, and free space | Recovery | Backups can be technically successful and still useless if storage fills, cleanup deletes too early, or files never leave the server. | Backup path, free space, retention period, cleanup job, off-server copy, encryption, compression, and access control. |
| 09 | SQL Agent failed jobs | Jobs | Failed jobs often hide backup, ETL, reporting, maintenance, and cleanup problems. | Failed jobs, failed steps, retry loops, disabled jobs, long-running jobs, and jobs that stopped writing useful history. |
| 10 | SQL Agent notifications and operators | Jobs | A failed job is only useful if the right person finds out quickly enough. | SQL Agent operators, Database Mail, notification settings, alert recipients, stale email addresses, and failed mail delivery. |
| 11 | Critical maintenance jobs | Maintenance | Maintenance jobs can keep an environment stable, or they can waste I/O and hide stale assumptions. | Backup jobs, CHECKDB jobs, index/statistics jobs, cleanup jobs, history retention, schedule overlap, and runtime drift. |
| 12 | Disk free space and file growth | Capacity | Low disk space and poor growth settings turn normal workload changes into outages or long pauses. | Drive free space, database file sizes, growth increments, percent growth, max size, file placement, and recent autogrowth events. |
| 13 | Database log growth and reuse waits | Capacity | Transaction log problems usually point to backup gaps, long transactions, replication, AG queues, or workload changes. | Log size, VLF shape, log growth, log reuse wait reason, long transactions, and last log backup. |
| 14 | tempdb size, growth, and pressure | Capacity | tempdb pressure affects sorting, version store, spills, row versioning, and many production workloads. | File count, size, growth, free space, version store, spills, allocation waits, and sudden growth. |
| 15 | SQL Server error log warnings | Logs | The error log often shows the first clear signs of failed backups, failed logins, I/O issues, memory pressure, and startup problems. | Recent errors, warnings, login failures, stack dumps, I/O messages, memory messages, and repeated startup entries. |
| 16 | Severity errors and corruption messages | Logs | High-severity errors and corruption messages can mean damaged data, storage issues, or failing components. | Severity 17-25 messages, 823, 824, 825, CHECKDB failures, suspect pages, and stack dumps. |
| 17 | Blocking, deadlocks, and long transactions | Performance | Concurrency problems can make a healthy-looking server feel broken to users. | Head blockers, deadlock graphs, long transactions, lock waits, blocked sessions, and application timing. |
| 18 | Top waits and resource pressure | Performance | Waits help separate CPU, memory, I/O, locking, logging, and network pressure. | Top waits, signal wait time, resource waits, active requests, CPU pressure, memory grants, and I/O waits. |
| 19 | Query Store status and plan regressions | Performance | Query Store helps find plan changes and expensive queries without guessing from one live moment. | Query Store enabled state, read-write state, capture mode, size limits, top regressions, and forced plans. |
| 20 | Max server memory, MAXDOP, and cost threshold | Config | Core instance settings can create avoidable memory pressure or parallelism noise when left at poor defaults. | Max server memory, min server memory, MAXDOP, cost threshold for parallelism, edition limits, CPU count, and workload type. |
| 21 | Storage latency and I/O stalls | Storage | Slow storage affects backups, restores, queries, tempdb, logs, and maintenance windows. | Virtual file stats, read latency, write latency, log write waits, file placement, and storage alerts. |
| 22 | Patch level and support status | Security | Unsupported or badly outdated builds make troubleshooting, security, and upgrade planning harder. | SQL Server version, product level, CU/GDR level, edition, Windows version, support lifecycle, and pending patches. |
| 23 | Sysadmin access and elevated logins | Security | Too many sysadmins makes change control, incident review, and security cleanup much harder. | Sysadmin members, server roles, database owners, shared accounts, old admin logins, and application logins with elevated rights. |
| 24 | Service accounts, proxies, and credentials | Security | Automation often runs with more access than it needs, and broken credentials can quietly break jobs. | SQL Server service accounts, SQL Agent service account, proxies, credentials, job owners, and external resource access. |
| 25 | Unsafe enabled features and linked servers | Security | Optional features and linked servers can widen exposure if nobody remembers why they are enabled. | xp_cmdshell, Ole Automation, CLR, external scripts, ad hoc distributed queries, linked servers, and remote login mappings. |
| 26 | Monitoring alerts and routing | Monitoring | Monitoring only helps when alerts reach people who can act and include enough detail to start diagnosis. | Backup alerts, failed-job alerts, disk alerts, service alerts, severity alerts, deadlock alerts, and recipient routing. |
| 27 | Alert noise and ignored warnings | Monitoring | Too much noise teaches people to ignore the system, including the alerts that matter. | Muted alerts, repeated alerts, ignored warnings, unclear thresholds, dashboards nobody checks, and tickets closed without action. |
| 28 | HA/DR status and replica health | HA / DR | A green HA screen does not mean failover or recovery will work cleanly. | AG health, cluster health, replica synchronization, send queue, redo queue, failover mode, backup preference, and DR lag. |
| 29 | Failover readiness: jobs, logins, DNS, app behavior | HA / DR | Failover breaks in the boring places: missing jobs, missing logins, DNS, firewall rules, and application connection behavior. | Jobs on secondary nodes, logins, linked servers, credentials, operators, listener DNS, firewall rules, and application validation steps. |
| 30 | Final action list: fix now, schedule, review deeper, watch, accept | Output | A health check is useful only when it turns findings into the next work list. | Items that need immediate work, next maintenance window work, deeper review, trend watching, or documented acceptance. |