Why HANA Performance Problems Are Different
Most SAP teams approach HANA performance tuning the same way they approached Oracle or DB2. They add indexes. They force join strategies with optimizer hints. They rewrite WHERE clauses to avoid full table scans. And then they wonder why the query got slower.
HANA is not a traditional relational database with an in-memory cache bolted on top. It is a fundamentally different architecture. The data lives in memory, organized in columns, compressed, and processed through a column engine that operates on entire vectors of data at once. The optimization paradigm is not about reducing disk I/O. Disk I/O is almost never the bottleneck. The bottlenecks are CPU utilization, memory allocation, column engine fallback, and unnecessary data movement between engines.
This distinction matters because the wrong mental model produces the wrong interventions:
| Traditional RDBMS Tuning | HANA Tuning |
|---|---|
| Add indexes to avoid full table scans | Remove unnecessary indexes (they consume memory and slow inserts) |
| Optimize disk I/O patterns | Optimize CPU and memory allocation |
| Use optimizer hints to force join order | Let the HANA optimizer choose, fix the data model instead |
| Denormalize to reduce joins | Normalize and let the column engine handle joins efficiently |
| Partition to spread I/O across disks | Partition for pruning, not for I/O distribution |
| Cache frequently accessed data | Data is already in memory — focus on reducing engine fallback |
We see this in every HANA optimization engagement we run. The team has already tried the obvious things. They added indexes. They reorganized tables. Performance did not improve, or it got worse. The reason is always the same: they were solving for the wrong bottleneck.
The Diagnostic Toolkit — Where to Start
Before tuning anything, you need to know what is actually slow and why. This sounds obvious. In practice, most teams skip this step entirely. They hear "the report is slow" and immediately start adding indexes or rewriting queries without understanding the root cause.
HANA provides a deep diagnostic toolkit. The challenge is knowing which tools to use and in what order.
HANA System Views for Performance Analysis
Three system views form the foundation of any performance investigation:
M_SQL_PLAN_CACHE stores execution statistics for every SQL statement the system has processed. Sort by TOTAL_EXECUTION_TIME descending to find the statements consuming the most cumulative resources. A statement that runs in 200ms but executes 50,000 times per day is a bigger problem than a report that takes 30 seconds but runs once a week.
M_EXPENSIVE_STATEMENTS captures individual executions that exceed configurable thresholds (default: 60 seconds execution time or 1GB memory consumption). This is where you find the outliers — the queries that spike during month-end close, the reports that timeout during peak hours, the batch jobs that consume all available statement memory.
M_SERVICE_MEMORY shows memory consumption by service (indexserver, nameserver, compileserver). When the indexserver is consuming 95% of the global allocation limit, every query is competing for memory. Performance tuning at the query level is pointless until you resolve the memory pressure.
HANA SQL Analyzer (Plan Visualizer)
The Plan Visualizer is the single most important diagnostic tool for query-level performance analysis. It shows you the execution plan as a visual graph, with each operator showing its type, cardinality estimates, actual row counts, and execution time.
What to look for:
- Column engine vs row engine operators. Column engine operators (JEAssembleResults, JEAggregation, JEHashJoin) are fast. Row engine operators (BwJoinMerge, BwLoopJoin) are slow. If your query shows row engine operators, that is almost always the performance problem.
- Full columnstore scans. A columnstore scan is not inherently bad — HANA is designed for them. But a full scan on a 500-million-row table when you only need 1,000 rows indicates missing partition pruning or a WHERE clause that cannot be pushed down.
- Hash join spills. When a hash join exceeds available memory, it spills to disk. In the Plan Visualizer, this appears as a "temporary table" node. Hash join spills can turn a 2-second query into a 2-minute query.
- Unnecessary materializations. The optimizer sometimes materializes intermediate results when it should stream them. This appears as MaterializeScan nodes with large row counts.
ST04 and DBACOCKPIT in S/4HANA
For Basis administrators, Transaction ST04 and DBACOCKPIT are the entry points from the ABAP stack. ST04 provides the SQL cache analysis, table statistics, and buffer hit ratios. DBACOCKPIT connects to the HANA-native diagnostics and gives you access to the expensive statements trace, the SQL plan cache, and the performance monitors — all without leaving the SAP GUI.
The key workflow is: identify the slow statement in ST04 → pull the execution plan from DBACOCKPIT → analyze in Plan Visualizer → trace root cause → fix.
The critical point: do not skip diagnostics. Every hour spent understanding the problem saves ten hours of trial-and-error tuning.
Query Optimization Patterns That Deliver 10x-100x Gains
Once you have identified the slow query and understand its execution plan, you can apply targeted optimizations. These are the patterns we see delivering the largest improvements in real S/4HANA environments.
Pushing Down to the Column Engine
This is the single most important HANA optimization concept. The column engine processes data in compressed columnar format using SIMD (Single Instruction, Multiple Data) operations. It can aggregate a billion rows in seconds. The row engine processes data row-by-row, like a traditional database. The same query can run 100x slower in the row engine than in the column engine.
What causes fallback to the row engine:
- Scalar UDFs (User-Defined Functions). A scalar UDF in a SELECT or WHERE clause forces the entire query to the row engine. Replace with SQL expressions or table UDFs where possible.
- Unsupported SQL functions. Certain string manipulation functions, some date functions, and locale-specific operations cannot execute in the column engine. Check SAP Note 2000002 for the current list.
- Certain data types. LOB columns, certain legacy ABAP data types, and columns with non-standard encodings can force row engine processing.
- Correlated subqueries. A correlated subquery that references the outer query row-by-row forces row engine execution. Rewrite as a JOIN or use window functions.
The fix is almost always the same: rewrite the SQL to use only column-engine-compatible operations. The Plan Visualizer tells you exactly where the fallback occurs. Fix that single operator and the entire query stays in the column engine.
Partition Pruning
HANA supports three partitioning strategies: hash, range, and round-robin. Each serves a different purpose:
| Strategy | Use Case | Pruning Behavior |
|---|---|---|
| Hash | Even data distribution across partitions | Prunes when WHERE clause uses equality on partition column |
| Range | Time-based data (fiscal year, posting date) | Prunes when WHERE clause uses range on partition column |
| Round-robin | Load balancing across nodes | No pruning — never use for query performance |
A partitioned table that does not prune is slower than an unpartitioned table. Each partition is a separate data structure. Without pruning, HANA must scan every partition and merge the results. You are adding overhead, not removing it.
Verify pruning in the execution plan. A partitioned table access should show PARTITION_COUNT less than the total number of partitions. If PARTITION_COUNT equals the total partitions, your WHERE clause is not enabling pruning, and the partitioning strategy is hurting performance.
The most common mistake: partitioning a table by hash on a column that is never used in WHERE clauses. The table is now split across partitions with no way to prune. Remove the partitioning or change the partition column.
Join Optimization
HANA uses two primary join strategies: hash joins and nested loop joins. The optimizer generally chooses well, but there are patterns where it makes suboptimal decisions.
Hash joins build a hash table from the smaller input and probe it with the larger input. They are fast and parallelizable. But if the "smaller" input is actually large (millions of rows), the hash table consumes significant memory and may spill to disk.
Nested loop joins iterate through the outer table and look up each row in the inner table. They are efficient when the inner table has a selective index and the outer table is small. They are catastrophic when both tables are large.
In star-schema joins (common in BW and analytical scenarios), place the smaller dimension table on the left side of the join. HANA builds the hash table from the left input. A 10-million-row fact table joined to a 100-row cost center table should have the cost center table on the left.
Aggregation Push-Down
Aggregation push-down means performing GROUP BY operations before joins rather than after. This can reduce the number of rows flowing through the join by orders of magnitude.
Consider a query that joins a 100-million-row line item table to a 50,000-row header table and then aggregates by company code. Without push-down, HANA joins all 100 million rows to the header, then aggregates. With push-down, HANA aggregates the line items by company code first (reducing 100 million rows to perhaps 200), then joins 200 rows to the header table.
CDS view patterns that prevent aggregation push-down:
- Calculated fields that reference columns from multiple tables, placed before the aggregation
- Certain CASE expressions that reference both sides of a join
- HAVING clauses that reference non-aggregated columns from joined tables
A CDS view that calculates then aggregates can be 50x slower than one that aggregates first. Restructure the view to aggregate at the lowest level, then calculate on the aggregated result.
CDS View Performance Anti-Patterns
CDS views are the standard data modeling approach in S/4HANA and Clean Core architecture. They are powerful, readable, and — when built wrong — devastatingly slow. These are the anti-patterns we encounter most frequently.
The Nested View Problem
The most common CDS performance problem is view stacking. A consumption view sits on top of a composite view, which sits on top of three interface views, each of which joins to two basic views. The result is a query that traverses 5 to 8 layers of views, accumulating JOINs at every level.
HANA tries to flatten nested views into a single execution plan. It does not always succeed. Complex view stacks with UNION, DISTINCT, or certain aggregation patterns prevent flattening. When flattening fails, each view layer becomes a separate subquery with its own materialization step.
Guideline: 2 to 3 layers maximum. Basic views at the bottom, composite views in the middle, consumption views at the top. If you need more layers, you probably need to rethink the data model. Five layers of views is not a data model. It is a code smell.
How to detect it: enable the SQL trace (ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini', 'SYSTEM') SET ('sqltrace', 'enabled') = 'true'), run the CDS view, and examine the generated SQL. If the SQL is 200 lines of nested subqueries, the view stack is the problem.
Association Abuse
CDS associations are a powerful abstraction. They declare relationships between entities without forcing a JOIN unless the association is actually consumed. In theory, this means only the necessary JOINs are generated.
In practice, associations generate LEFT OUTER JOINs by default. A view with 15 associations where the consumer only needs data from 3 of them should generate 3 JOINs. But if the view exposes calculated fields that reference multiple associations, or if the consumer selects *, all 15 LEFT OUTER JOINs are generated.
LEFT OUTER JOINs are more expensive than INNER JOINs because HANA must preserve all rows from the left side, even when no match exists on the right. If the relationship is always 1:1 and referential integrity is guaranteed, an INNER JOIN is both faster and semantically correct.
Audit your associations. Enable the HANA SQL trace, run the CDS view from its consumption point (Fiori app, OData service, ABAP report), and examine the generated SQL. Count the JOINs. Compare to what is actually needed. The gap is your optimization opportunity.
Calculated Fields in WHERE Clauses
A WHERE clause like WHERE CONCAT(company_code, fiscal_year) = '100002025' prevents HANA from using any index or dictionary-based filtering on company_code or fiscal_year individually. The column engine must evaluate the CONCAT function for every row before filtering. On a 500-million-row table, this is the difference between a 50ms dictionary lookup and a 30-second full scan.
The fix: redesign the WHERE clause to use the raw columns (WHERE company_code = '1000' AND fiscal_year = '2025') or add a calculated column to the table that is maintained on insert and indexed.
Calculated fields in CDS view WHERE clauses are particularly insidious because the developer does not see the generated SQL. The CDS syntax looks clean. The generated SQL is a performance disaster.
Missing Annotations for Analytical Queries
CDS views intended for analytical consumption (reporting, dashboards, KPIs) must carry the correct annotations:
@Analytics.query: true
@Aggregation.default: #SUMWithout @Analytics.query: true, the HANA optimizer treats the view as a transactional query. This means no star-join optimization, no automatic aggregation push-down, and no analytical engine processing. The same data model, the same query, the same WHERE clause — but a fundamentally different execution path.
We have seen analytical CDS views running 20x slower than expected simply because the @Analytics.query annotation was missing. The fix took 30 seconds. The performance investigation that preceded it took two days.
Memory Management — The Silent Performance Killer
HANA performance tuning is incomplete without addressing memory. A perfectly optimized query will still run slowly if the system is under memory pressure. And memory pressure in HANA is not a gradual degradation. It is a cliff.
Understanding HANA Memory Allocation
HANA memory management operates on three levels:
Global Allocation Limit (GAL): The maximum memory HANA can consume. Typically set to 90% of physical RAM. When HANA approaches the GAL, it begins aggressive countermeasures: column unloads, statement cancellation, and ultimately OOM (Out of Memory) errors.
Statement Memory Limit: The maximum memory a single SQL statement can consume. Default is 0 (unlimited). Without a statement memory limit, a single poorly written query can consume all available memory, triggering column unloads that impact every other user on the system.
Column Unloads: When memory pressure exceeds configurable thresholds, HANA unloads column tables from memory using an LRU (Least Recently Used) algorithm. The data is not lost — it is reloaded from disk on next access. But that reload can take seconds to minutes for large tables.
The cascade is predictable: memory pressure → column unloads → next query hits an unloaded table → disk I/O for reload → slow query → more memory pressure from the reload → more unloads. This feedback loop is responsible for the "everything is slow and we do not know why" incidents that hit SAP systems during peak periods.
Column Unload Behavior and Table Pinning
Column unloads are the most common cause of intermittent, hard-to-reproduce performance problems. A report runs in 3 seconds in the morning and 45 seconds in the afternoon. The difference: in the afternoon, the table was unloaded due to memory pressure from batch jobs, and the first access triggers a full reload from disk.
Identifying thrashing tables: Query M_CS_UNLOADS to find tables that have been unloaded more than 10 times in the past 24 hours. These tables are loading, being used, getting unloaded under memory pressure, and then reloading on the next access. Every load/unload cycle wastes CPU and I/O.
Pinning critical tables: Use ALTER TABLE <table> PRELOAD ALL to pin tables in memory. Pinned tables are loaded at system startup and are the last to be unloaded under memory pressure. Pin your most frequently accessed tables: ACDOCA, BSEG, VBAP, EKPO, and any custom tables central to your reporting. But be conservative — pinning too many tables leaves insufficient memory for everything else.
Delta Merge Optimization
Every column table in HANA has two storage structures: the main store (compressed, read-optimized, columnar) and the delta store (uncompressed, write-optimized, row-based). Inserts and updates go to the delta store. The delta merge process moves data from the delta store to the main store.
Why this matters for performance: When HANA queries a column table, it runs two separate scans — one on the main store (fast, compressed, columnar) and one on the delta store (slow, uncompressed, row-based). A table with 50 million rows in the main store and 10 million rows in the delta store effectively runs at row-store speed for 20% of its data on every query.
Auto-merge triggers when the delta store exceeds certain thresholds (default: 10,000 rows or 10% of main store size, whichever is larger). But auto-merge can fall behind during high-insert workloads. Common scenarios:
- Batch data loads that insert millions of rows faster than auto-merge can process them
- Tables with frequent updates (each update creates a new version in the delta store)
- Auto-merge disabled or misconfigured
Monitor delta store sizes with M_CS_ALL_COLUMNS (check MEMORY_SIZE_IN_DELTA). If any table has a delta store larger than 20% of its main store, investigate why merge is not keeping up and consider manual merge or adjusting auto-merge thresholds.
HANA Cloud vs On-Premise — Performance Implications
The migration from HANA on-premise to HANA Cloud (or S/4HANA Cloud) introduces performance characteristics that most teams do not anticipate. This is not a lift-and-shift from a performance perspective.
Multi-Tier Storage and Warm Data
HANA Cloud introduces multi-tier storage: hot data in memory (like on-premise HANA), warm data on disk-based storage (SAP NSE — Native Storage Extension), and cold data in data lake. The cost savings are significant. The performance implications are equally significant.
Warm data queries run 5 to 10x slower than hot data queries. The data is not in memory. It must be loaded from disk, decompressed, and processed. For historical reporting (query last quarter's financial data, pull three-year trend analysis), this is usually acceptable. For operational queries (check current inventory, pull today's orders), it is not.
The challenge: deciding what stays hot and what goes warm. Get it wrong and your operational users experience multi-second response times on queries that used to return in milliseconds. The tiering strategy must be driven by actual access patterns, not assumptions.
Network Latency and Chatty ABAP Patterns
On-premise HANA deployments typically have sub-millisecond network latency between the application server and the database. HANA Cloud introduces network hops — 1 to 5ms per round trip in the same region, more across regions.
This is invisible for queries that make a single database call and process the result. It is catastrophic for chatty ABAP patterns: row-by-row processing with SELECT SINGLE in a loop, nested RFC calls that hit the database hundreds of times per transaction, custom reports that execute thousands of small queries instead of one large one.
A SELECT SINGLE in a LOOP that executes 10,000 times adds 10,000 network round trips. At 0.1ms latency on-premise, that is 1 second of overhead. At 3ms in the cloud, it is 30 seconds. The same code, the same data, 30x slower because of network physics.
Fix the chatty patterns before migrating to the cloud. This means array operations (SELECT FOR ALL ENTRIES, INSERT FROM TABLE), buffered reads, and redesigned RFC interfaces that batch calls instead of sending them one at a time.
Monitoring and Alerting for Proactive Performance
Reactive performance tuning means someone calls to say the system is slow, and you start investigating. Proactive performance management means you see the problem developing and fix it before anyone notices. The difference is monitoring and alerting.
Key Metrics to Track
Memory utilization (indexserver): Alert at 80% of GAL. Act at 85%. At 90%, you are one large report away from column unloads and cascading performance degradation.
Delta merge queue length: If the merge queue consistently has more than 5 pending merges, the system cannot keep up with write volume. Tables are accumulating large delta stores, and query performance is degrading silently.
Long-running statements: Any statement running longer than 300 seconds should generate an alert. Any statement running longer than 600 seconds should trigger investigation. Long-running statements hold locks, consume memory, and block other operations.
CPU utilization by service: The indexserver should not sustain above 80% CPU for more than 5 minutes outside of known batch windows. Sustained high CPU often indicates queries stuck in the row engine, excessive compilation, or runaway parallelism.
Column unload count (hourly): More than 50 unloads per hour outside of a known maintenance window indicates memory pressure. Correlate with memory utilization to determine if you need more memory, fewer pinned tables, or query optimization.
Alert Thresholds That Prevent Incidents
The difference between useful alerting and noise is threshold calibration:
| Metric | Warning | Critical | Action |
|---|---|---|---|
| Memory utilization | 80% GAL | 90% GAL | Identify memory consumers, unload non-critical tables |
| Delta merge queue | 5 pending merges | 10 pending merges | Investigate insert volume, adjust merge thresholds |
| Statement duration | 300 seconds | 600 seconds | Kill statement, investigate plan change |
| CPU sustained | 70% for 10 min | 85% for 10 min | Identify top CPU consumers, check for row engine fallback |
| Column unloads | 50/hour | 200/hour | Pin critical tables, increase memory, reduce workload |
Tooling Stack
HANA Cockpit provides built-in alerting with configurable thresholds. It is sufficient for most environments and requires no additional infrastructure. Configure email notifications for critical alerts.
SAP Solution Manager (Focused Run) provides cross-system monitoring, trend analysis, and root cause investigation across the entire SAP landscape. If you are running multiple SAP systems, this is the appropriate tool for centralized monitoring.
Prometheus and Grafana are the standard for teams that want custom dashboards, long-term metric retention, and integration with non-SAP monitoring. The HANA exporter for Prometheus exposes HANA metrics as Prometheus endpoints. Grafana dashboards provide the visualization layer. This stack is particularly valuable for teams already running Kubernetes-based infrastructure alongside SAP.
A Performance Tuning Checklist
When you need to systematically assess HANA performance — whether for a proactive review or in response to a reported issue — follow this sequence. Each step builds on the previous one.
- Check system-level memory utilization. Query
M_SERVICE_MEMORYfor the indexserver. If memory is above 85% of GAL, resolve memory pressure before doing anything else. Query-level tuning is pointless under memory pressure.
- Review column unload history. Query
M_CS_UNLOADSfor the past 7 days. Identify tables that unload and reload frequently. Pin critical tables. Investigate why memory is insufficient.
- Examine the expensive statements trace. Query
M_EXPENSIVE_STATEMENTSfor the past 24 hours. Sort byDURATION_MICROSECandMEMORY_SIZE. Identify the top 10 most expensive statements by both metrics.
- Analyze the SQL plan cache. Query
M_SQL_PLAN_CACHEsorted byTOTAL_EXECUTION_TIME. Identify the top 20 statements by cumulative execution time. Calculate average execution time and execution count. High-frequency moderate-cost statements often matter more than low-frequency expensive ones.
- Pull execution plans for the top offenders. Use the Plan Visualizer for each of the top 5 most expensive statements. Look for row engine operators, full partition scans, hash join spills, and unnecessary materializations.
- Check delta store sizes. Query
M_CS_ALL_COLUMNSaggregated by table. Any table where delta store exceeds 20% of main store needs investigation. Trigger manual merge if needed, then investigate why auto-merge fell behind.
- Audit CDS view nesting depth. For CDS-based queries in the top offenders list, trace the view stack. Count layers. If any view stack exceeds 3 layers, evaluate whether flattening is possible.
- Verify partition pruning. For partitioned tables accessed by expensive queries, check the execution plan for partition pruning. If no pruning occurs, either adjust the WHERE clause, change the partition strategy, or remove partitioning.
- Review analytical annotations. For CDS views used in reporting or analytics, verify that
@Analytics.query: trueand appropriate@Aggregation.defaultannotations are present. Missing annotations change the execution path entirely.
- Establish baseline metrics and monitoring. Record current values for memory utilization, delta merge queue length, top 20 expensive statements, and column unload frequency. Set alert thresholds per the table above. Schedule monthly review.
This checklist is the starting point, not the complete engagement. Every SAP landscape has unique characteristics — custom code, specific data volumes, industry-specific workloads — that require deeper investigation.
If you want a HANA performance audit with specific recommendations for your landscape, schedule a free assessment. We review your system metrics, identify the top optimization opportunities, and provide a prioritized action plan — typically within one week.