Understanding Time-Series Data

Understanding Time-Series Data

Learn what makes time-series data unique and why Machbase is specifically designed to handle it efficiently. This guide explains the fundamental concepts behind time-series databases.

What is Time-Series Data?

Time-series data is a sequence of data points indexed by time. Each record has a timestamp and one or more values:

(timestamp, value1, value2, ...)

Common Examples

IoT and Sensors:

Temperature readings every second
GPS coordinates from vehicles
Smart meter electricity usage
Manufacturing equipment telemetry

Application Monitoring:

Server CPU/memory metrics
Application logs
HTTP request logs
Database query performance

Business Data:

Stock market ticks
Sales transactions
Website clickstreams
Mobile app events

Characteristics of Time-Series Workloads

1. Write-Heavy

Time-series systems are dominated by writes:

Millions of writes per second (sensors, logs, events)
Few updates or deletes (historical data rarely changes)
Append-only pattern (always adding new data)

Traditional databases struggle because:

Row-level locking slows writes
Complex UPDATE logic not needed
Transaction overhead wasted

Machbase solution: Append-only architecture with no row locking.

2. Time-Based Queries

Most queries involve time ranges:

-- Last hour's data
SELECT * FROM logs DURATION 1 HOUR;

-- Yesterday's statistics
SELECT AVG(temperature) FROM sensors
WHERE time BETWEEN '2025-10-09 00:00:00' AND '2025-10-10 00:00:00';

Traditional databases struggle because:

Generic indexes not optimized for time
Full table scans for time ranges
No time-aware partitioning

Machbase solution: Time-based partitioning and indexing built-in.

3. Recent Data Focus

Users care most about recent data:

Last 24 hours for monitoring
Last week for trending
Older data for compliance/archival

Traditional databases struggle because:

All data treated equally
No automatic aging policies
Manual archival processes

Machbase solution: DURATION keyword, automatic partitioning, efficient time-based deletion.

4. High Compression Potential

Time-series data compresses extremely well:

Sequential timestamps
Repeated patterns
Similar values

Traditional databases struggle because:

Row-oriented storage
Generic compression
2-5x compression typical

Machbase solution: Columnar storage with specialized compression (10-100x ratios).

5. Bulk Aggregations

Common analytical queries:

MIN, MAX, AVG over time windows
Grouping by time intervals
Statistical summaries

Traditional databases struggle because:

Aggregations computed on-demand
No pre-computed statistics
Slow for large datasets

Machbase solution: Automatic rollup tables with pre-computed statistics.

Why Traditional Databases Fail

Problem 1: Row-Oriented Storage

Traditional databases store entire rows together:

Row 1: [timestamp1, sensor_id, temp, humidity]
Row 2: [timestamp2, sensor_id, temp, humidity]
Row 3: [timestamp3, sensor_id, temp, humidity]

Issue: When you query “AVG(temperature)”, database reads ALL columns, not just temperature.

Machbase approach: Columnar storage reads only needed columns.

Problem 2: B-Tree Indexes

Traditional databases use B-Tree indexes:

Good for random access
Expensive for sequential writes
Not optimized for time-ordered data

Machbase approach: LSM indexes and time-based partitioning.

Problem 3: ACID Overhead

Traditional databases provide full ACID guarantees:

Row-level locking
Transaction logs
Rollback support

Not needed for time-series:

Historical data never changes
No concurrent updates to same row
Wasted overhead

Machbase approach: Simplified architecture for append-only data.

Problem 4: No Time Awareness

Traditional databases don’t understand time:

No automatic time-based partitioning
No built-in retention policies
No time-optimized queries

Machbase approach: Time is a first-class concept.

Machbase Design Principles

1. Append-Only Architecture

Data is only added, never modified:

-- Allowed: Adding new data
INSERT INTO sensors VALUES ('sensor01', NOW, 25.3);

-- Not allowed on Tag/Log tables: Modifying old data
UPDATE sensors SET temperature = 26.0 WHERE id = 123;  -- ✗

Benefits:

No row locking
Ultra-fast writes (millions/sec)
Data integrity (can’t alter history)

2. Time-Based Partitioning

Data is automatically partitioned by time:

Partition 1: 2025-10-01 to 2025-10-07
Partition 2: 2025-10-08 to 2025-10-14
Partition 3: 2025-10-15 to 2025-10-21

Benefits:

Queries only scan relevant partitions
Easy data retention (drop old partitions)
Optimal compression per partition

3. Columnar Compression

Each column stored separately:

Timestamps:    [100, 101, 102, 103, 104, ...]
Sensor IDs:    [s01, s01, s01, s02, s02, ...]
Temperatures:  [22.5, 22.7, 22.6, 21.3, 21.5, ...]

Benefits:

High compression (similar values)
Read only needed columns
Faster analytical queries

4. Write-Optimized Indexes

LSM (Log-Structured Merge) indexes:

Optimized for sequential writes
Batch writes to memory
Periodic merging to disk

Benefits:

Millions of writes per second
No write amplification
Consistent performance

5. Automatic Statistics (Rollup)

Tag tables generate statistics automatically:

-- Raw data: millions of rows
INSERT INTO sensors VALUES ('sensor01', NOW, 25.3);

-- Automatic rollup: per-second, per-minute, per-hour
SELECT * FROM sensors WHERE rollup = hour;
-- Returns: MIN, MAX, AVG, SUM, COUNT, SUMSQ

Benefits:

Instant analytics
No manual aggregation
Reduced query time

Time-Series vs Traditional Databases

Feature	Traditional DB	Machbase
Primary Use	Transactions	Analytics
Write Pattern	Random	Sequential
UPDATE Support	Full	Limited
Indexing	B-Tree	LSM + Partitioned
Storage	Row-oriented	Columnar
Compression	2-5x	10-100x
Write Speed	1,000s/sec	Millions/sec
Time Queries	Generic	Optimized
Data Retention	Manual	Automatic

Time-Series Data Patterns

Pattern 1: High-Frequency Sensors

Sensor ID: sensor01
Frequency: 10 readings/second
Data Volume: 864,000 readings/day

Best for: Tag table with SUMMARIZED columns

CREATE TAGDATA TABLE sensors (
    sensor_id VARCHAR(20) PRIMARY KEY,
    time DATETIME BASETIME,
    value DOUBLE SUMMARIZED
);

Pattern 2: Event Streams

Type: Application logs
Frequency: Variable (bursty)
Data Volume: Millions/day
Schema: Flexible (many columns)

Best for: Log table

CREATE TABLE app_logs (
    level VARCHAR(10),
    message VARCHAR(2000),
    user_id INTEGER
);

Pattern 3: Real-Time State

Type: Live dashboard data
Frequency: Constant updates
Data Volume: Small (100s of rows)
Persistence: Not required

Best for: Volatile table

CREATE VOLATILE TABLE live_status (
    device_id INTEGER PRIMARY KEY,
    status VARCHAR(20),
    last_updated DATETIME
);

Pattern 4: Dimension Data

Type: Device metadata
Frequency: Rare updates
Data Volume: Small (1000s of rows)
Persistence: Required

Best for: Lookup table

CREATE LOOKUP TABLE devices (
    device_id INTEGER,
    name VARCHAR(100),
    location VARCHAR(200)
);

Common Time-Series Challenges

Challenge 1: Data Volume

Problem: Sensors generate millions of readings/day

Machbase solution:

High-speed ingestion (APPEND protocol)
Automatic compression (10-100x)
Time-based retention

-- Keep only 30 days
DELETE FROM sensors EXCEPT 30 DAYS;

Challenge 2: Query Performance

Problem: Analyzing millions of rows is slow

Machbase solution:

Automatic rollup statistics
Time-based partitioning
Columnar storage

-- Fast: Query pre-aggregated data
SELECT * FROM sensors WHERE rollup = hour;

Challenge 3: Late-Arriving Data

Problem: Data arrives out of order

Machbase solution:

LSM indexes handle out-of-order writes
Background merge processes
Consistent query results

Challenge 4: Multiple Time Zones

Problem: Global deployments with different time zones

Machbase solution:

Store in UTC, display in local time
Timezone conversion functions
Client-side timezone setting

machsql -z +0900  # Korea timezone

Best Practices

1. Use Appropriate Table Types

Match data characteristics to table type:

Regular sensor readings → Tag table
Variable event streams → Log table
Real-time updates → Volatile table
Reference data → Lookup table

2. Implement Data Retention

Don’t keep data forever:

-- Daily cleanup job
DELETE FROM logs EXCEPT 90 DAYS;

3. Use DURATION for Time Queries

Optimized syntax for time ranges:

-- Good
SELECT * FROM logs DURATION 1 HOUR;

-- Less optimal
SELECT * FROM logs WHERE _arrival_time >= NOW - INTERVAL '1' HOUR;

4. Batch Writes When Possible

Use APPEND protocol for bulk inserts:

Higher throughput
Better compression
Reduced overhead

5. Query Rollup, Not Raw Data

For analytics, use pre-aggregated data:

-- Fast: Rollup
SELECT AVG(avg_temperature) FROM sensors WHERE rollup = hour;

-- Slow: Raw data
SELECT AVG(temperature) FROM sensors;

Next Steps

Now that you understand time-series data:

Read: Table Types Overview - Choose the right table
Learn: Indexing and Performance - Optimize queries
Practice: Tutorials - Hands-on exercises

Key Takeaways

Time-series data is write-heavy and time-focused
Traditional databases not optimized for time-series
Machbase uses append-only architecture
Columnar storage enables high compression
Time-based partitioning optimizes queries
Automatic rollup provides instant analytics
Choose the right table type for your data pattern

Understanding time-series data fundamentals helps you design better Machbase solutions!

Last updated on Nov 4, 2025

Table Types: Complete Guide