Indexing and Performance
Learn how Machbase indexes work and how to optimize query performance. This guide covers indexing strategies, automatic index management, and performance tuning.
Indexing Overview
Machbase uses different indexing strategies for each table type, all designed for time-series workloads:
Table Type | Index Type | Management | Purpose |
---|---|---|---|
Tag | 3-level Partitioned | Automatic | Fast sensor_id + time queries |
Log | LSM (optional) | Manual | Fast column lookups |
Volatile | Red-Black Tree | Automatic | Fast PRIMARY KEY access |
Lookup | LSM (optional) | Manual | Fast column lookups |
Key insight: Most users never need to manually create indexes!
Tag Table Indexing
Automatic 3-Level Partitioned Index
Tag tables automatically create a sophisticated indexing system:
Level 1: Tag Name Index
- Quickly find specific sensor
- O(log n) lookup by sensor_id
Level 2: Time Partitioning
- Data partitioned by time ranges
- Skips irrelevant time periods
Level 3: Value Index (for SUMMARIZED columns)
- Per-partition value indexes
- Fast range queries
How It Works
CREATE TAGDATA TABLE sensors (
sensor_id VARCHAR(20) PRIMARY KEY,
time DATETIME BASETIME,
temperature DOUBLE SUMMARIZED
);
-- Behind the scenes, Machbase creates:
-- 1. Index on sensor_id (tag name)
-- 2. Time-based partitions
-- 3. Indexes on temperature within each partition
Query Optimization
Optimal Queries (use all 3 index levels):
-- Fast: Uses sensor_id index + time partition + value index
SELECT * FROM sensors
WHERE sensor_id = 'sensor01'
AND time BETWEEN '2025-10-10 00:00:00' AND '2025-10-10 23:59:59'
AND temperature > 25.0;
Good Queries (use 2 index levels):
-- Uses sensor_id + time partition
SELECT * FROM sensors
WHERE sensor_id = 'sensor01'
DURATION 1 HOUR;
Slow Queries (full scan):
-- Scans all sensors (no sensor_id filter)
SELECT * FROM sensors
WHERE temperature > 30.0;
Rollup Table Indexes
Rollup tables are automatically indexed:
-- Query rollup (very fast, pre-aggregated data)
SELECT * FROM sensors
WHERE sensor_id = 'sensor01'
AND rollup = hour
DURATION 7 DAY;
-- Available rollup levels: sec, min, hour
Best Practices
DO:
- Always include sensor_id in WHERE clause
- Use time filters (DURATION or time range)
- Query rollup tables for statistics
- Let Machbase manage indexes automatically
DON’T:
- Try to create manual indexes (not supported)
- Query without sensor_id filter (slow full scan)
- Query raw data when rollup is sufficient
Log Table Indexing
LSM Index (Optional)
Log tables can optionally create LSM (Log-Structured Merge) indexes:
-- Create log table
CREATE TABLE app_logs (
level VARCHAR(10),
component VARCHAR(50),
message VARCHAR(2000)
);
-- Create LSM index on frequently queried column
CREATE INDEX idx_level ON app_logs(level);
CREATE INDEX idx_component ON app_logs(component);
When to Create Indexes
Create index if:
- Column frequently in WHERE clause
- Query performance is slow
- Column has moderate cardinality (not too many unique values)
Skip index if:
- Only querying by time
- Column has very high cardinality
- Write performance is critical
LSM Index Characteristics
Advantages:
- Optimized for write-heavy workloads
- No blocking on writes
- Automatically maintained
How LSM Works:
- New writes go to memory buffer
- Periodically flushed to disk segments
- Background merge process combines segments
- Reads search across segments
Index Building
-- Check index status
SHOW INDEXES;
-- View index building progress
SHOW INDEXGAP;
-- Index builds in background (non-blocking)
Query Optimization
With Index:
-- Fast: Uses idx_level
SELECT * FROM app_logs
WHERE level = 'ERROR'
DURATION 1 HOUR;
Without Index:
-- Slower: Full scan, but still uses time partitioning
SELECT * FROM app_logs
WHERE message SEARCH 'timeout'
DURATION 1 HOUR;
Volatile Table Indexing
Automatic Red-Black Tree
Volatile tables automatically create in-memory indexes on PRIMARY KEY:
CREATE VOLATILE TABLE device_status (
device_id INTEGER PRIMARY KEY, -- Automatically indexed
status VARCHAR(20),
last_updated DATETIME
);
Performance Characteristics
- Lookup: O(log n) by PRIMARY KEY
- Insert: O(log n)
- Update: O(log n)
- Delete: O(log n)
All operations are in-memory (very fast!).
Query Optimization
Fast:
-- Uses PRIMARY KEY index
SELECT * FROM device_status WHERE device_id = 101;
UPDATE device_status SET status = 'RUNNING' WHERE device_id = 101;
DELETE FROM device_status WHERE device_id = 101;
Slower:
-- Full scan (no index on status)
SELECT * FROM device_status WHERE status = 'ERROR';
Lookup Table Indexing
LSM Index (Same as Log Table)
CREATE LOOKUP TABLE devices (
device_id INTEGER,
device_name VARCHAR(100),
location VARCHAR(200)
);
-- Create indexes on frequently queried columns
CREATE INDEX idx_device_id ON devices(device_id);
CREATE INDEX idx_location ON devices(location);
Same principles as Log table indexing.
Time-Based Partitioning
Automatic Partitioning
All disk-based tables (Tag, Log, Lookup) use time-based partitioning:
Partition Structure:
┌──────────────────────────────────────┐
│ Partition 1: Week 1 (Oct 1-7) │
│ - Data for this week │
│ - Separate index │
│ - Optimized compression │
├──────────────────────────────────────┤
│ Partition 2: Week 2 (Oct 8-14) │
│ - Data for this week │
│ - Separate index │
│ - Optimized compression │
├──────────────────────────────────────┤
│ Partition 3: Week 3 (Oct 15-21) │
│ - Active partition │
│ - Less compressed (for writes) │
└──────────────────────────────────────┘
Benefits
Query Performance:
- Only scan relevant partitions
- Skip old/future partitions
- Parallel partition scanning
Data Management:
- Easy retention (drop old partitions)
- Per-partition compression
- Efficient backup/restore
Query Optimization
Good (scans 1 partition):
SELECT * FROM logs DURATION 1 DAY;
Slower (scans multiple partitions):
SELECT * FROM logs DURATION 30 DAY;
Very Slow (scans all partitions):
SELECT * FROM logs; -- No time filter!
Query Optimization Strategies
1. Always Use Time Filters
Bad:
SELECT * FROM sensors WHERE sensor_id = 'sensor01';
Good:
SELECT * FROM sensors
WHERE sensor_id = 'sensor01'
DURATION 1 HOUR;
2. Use DURATION Keyword
Good (optimized syntax):
SELECT * FROM logs DURATION 1 HOUR;
Less Optimal (manual time filter):
SELECT * FROM logs
WHERE _arrival_time >= NOW - INTERVAL '1' HOUR;
3. Query Rollup, Not Raw Data
Good (instant results):
SELECT * FROM sensors
WHERE sensor_id = 'sensor01'
AND rollup = hour
DURATION 7 DAY;
Slow (millions of rows):
SELECT sensor_id, AVG(temperature)
FROM sensors
WHERE sensor_id = 'sensor01'
DURATION 7 DAY
GROUP BY sensor_id;
4. Limit Result Sets
Good:
SELECT * FROM logs DURATION 1 HOUR LIMIT 1000;
Bad:
SELECT * FROM logs; -- Returns millions of rows!
5. Use Indexes on High-Selectivity Columns
Good Index (moderate cardinality):
-- level: ERROR, WARN, INFO (low cardinality - good!)
CREATE INDEX idx_level ON logs(level);
Bad Index (very high cardinality):
-- message: millions of unique values (don't index!)
CREATE INDEX idx_message ON logs(message); -- Don't do this!
Compression
Automatic Compression
Machbase automatically compresses data:
Logical Compression (columnar):
- Each column compressed separately
- Pattern-based compression
- 10-100x compression ratios
Physical Compression (block):
- Disk block compression
- Transparent to users
- Additional 2-5x compression
Compression Characteristics
Table Type | Compression Method | Typical Ratio |
---|---|---|
Tag | Columnar + block | 50-100x |
Log | Columnar + block | 10-50x |
Volatile | None (in-memory) | 1x |
Lookup | Block-level | 2-5x |
Impact on Performance
Reads:
- Compressed data read faster (less I/O)
- Decompression overhead minimal
- Net benefit for large scans
Writes:
- Buffer in memory first
- Compress during flush
- No write-time penalty
Performance Monitoring
Check Table Statistics
-- View table info
SHOW TABLE sensors;
-- View storage usage
SHOW STORAGE;
-- View tablespace info
SHOW TABLESPACES;
Monitor Queries
-- View active queries
SHOW STATEMENTS;
-- Check slow queries
-- (long-running queries appear here)
Index Health
-- Check indexes
SHOW INDEXES;
-- View index building progress
SHOW INDEXGAP;
Performance Tuning
Server Configuration
Key parameters (in machbase.conf):
# Memory settings
BUFFER_POOL_SIZE = 2G # Shared buffer cache
VOLATILE_TABLESPACE_SIZE = 1G # Volatile table memory
# Write performance
CHECKPOINT_INTERVAL_SEC = 600 # Checkpoint frequency
LOG_BUFFER_SIZE = 64M # Write buffer size
# Query performance
MAX_QPX_MEM = 512M # Per-query memory limit
QUERY_TIMEOUT = 60 # Query timeout (seconds)
Application Optimization
Batch Writes:
// Use APPEND protocol for bulk inserts
SQLAppendOpen(stmt, "sensors");
for (int i = 0; i < 10000; i++) {
SQLAppendDataV(stmt, sensor_id, time, value);
}
SQLAppendClose(stmt); // Commit batch
Connection Pooling:
- Reuse connections
- Avoid connection overhead
- 10-20 connections typical
Query Result Limits:
-- Always limit results for UI queries
SELECT * FROM logs DURATION 1 HOUR LIMIT 100;
Common Performance Issues
Issue 1: Slow Queries Without Time Filter
Problem:
SELECT * FROM sensors WHERE sensor_id = 'sensor01';
-- Slow: scans all partitions
Solution:
SELECT * FROM sensors
WHERE sensor_id = 'sensor01'
DURATION 1 HOUR; -- Fast: scans 1 partition
Issue 2: Querying Raw Data for Analytics
Problem:
SELECT AVG(temperature) FROM sensors DURATION 7 DAY;
-- Slow: aggregates millions of rows
Solution:
SELECT AVG(avg_temperature) FROM sensors
WHERE rollup = hour
DURATION 7 DAY; -- Fast: aggregates pre-computed data
Issue 3: Missing Indexes on Log Tables
Problem:
-- Slow without index
SELECT * FROM logs WHERE level = 'ERROR' DURATION 1 DAY;
Solution:
CREATE INDEX idx_level ON logs(level);
-- Now fast!
Issue 4: Large Result Sets
Problem:
SELECT * FROM logs DURATION 30 DAY;
-- Returns millions of rows!
Solution:
-- Aggregate instead
SELECT level, COUNT(*) FROM logs
DURATION 30 DAY
GROUP BY level;
-- Or limit results
SELECT * FROM logs DURATION 30 DAY LIMIT 1000;
Best Practices Summary
- Always use time filters (DURATION or time range)
- Query rollup for analytics (Tag tables)
- Create indexes on frequently queried columns (Log/Lookup tables)
- Limit result sets (use LIMIT clause)
- Use batch writes (APPEND protocol)
- Let Machbase manage indexes (Tag/Volatile tables)
- Monitor query performance (SHOW STATEMENTS)
- Implement data retention (DELETE old data)
Next Steps
- Apply Knowledge: Common Tasks - Query optimization
- Learn More: Table Types - Detailed table documentation
- Troubleshoot: Troubleshooting - Performance issues
Key Takeaways
- Tag tables use automatic 3-level partitioned indexes
- Log/Lookup tables can use optional LSM indexes
- Volatile tables use automatic in-memory indexes
- Time-based partitioning is automatic and essential
- Always filter by time for optimal performance
- Query rollup, not raw data for analytics
- Most users never create manual indexes - it’s automatic!
Master indexing and unlock Machbase’s full performance potential!