Core Concepts

Deep dive into Machbase architecture, design principles, and key concepts. This section helps you understand how Machbase works under the hood and why it’s optimized for time-series data.

In This Section

Understanding Time-Series Data

Learn what makes time-series data special and why traditional databases struggle with it:

Characteristics of time-series workloads
Write-heavy vs read-heavy patterns
Why append-only architecture matters
Time-based partitioning and compression

Table Types Overview

Complete guide to choosing the right table type:

Detailed comparison of all 4 table types
Decision flowchart and selection guide
Performance characteristics
Common use cases and anti-patterns
When to use each type

Indexing and Performance

How Machbase achieves high performance:

Partitioned indexing for Tag tables
LSM (Log-Structured Merge) indexing
Automatic index management
Query optimization strategies
Understanding rollup statistics

Who Should Read This

This section is for:

Developers designing Machbase-based applications
Architects planning system architecture
DBAs optimizing performance
Data Engineers implementing data pipelines

Prerequisites

Before diving into Core Concepts:

Complete Getting Started section
Finish at least 2 of the Tutorials
Have hands-on experience with Machbase

Learning Path

We recommend reading in this order:

Time-Series Data - Understand the problem domain
Table Types Overview - Choose the right tools
Indexing - Optimize performance

Quick Reference

Table Type Decision Guide

Is it sensor data (ID, time, value)?
    YES → Tag Table

Is it log/event data?
    YES → Log Table

Need UPDATE/DELETE in memory?
    YES → Volatile Table

Is it reference/master data?
    YES → Lookup Table

Performance Characteristics

Table Type	Write Speed	Read Speed	UPDATE/DELETE	Storage
Tag	Millions/sec	Very Fast	No*	Disk
Log	Millions/sec	Fast	Time-based	Disk
Volatile	10,000s/sec	Very Fast	By Key	Memory
Lookup	100s/sec	Fast	By Key	Disk

*Tag table metadata can be updated

Key Concepts at a Glance

Write-Once Architecture

Machbase is optimized for append-only data:

No row-level locking
Ultra-fast sequential writes
Data integrity (logs can’t be altered)

Time-Based Partitioning

Data is automatically partitioned by time:

Efficient time-range queries
Easy data retention management
Optimized compression

Columnar Compression

Data is stored column-by-column:

10-100x compression ratios
Faster analytical queries
Reduced storage costs

Automatic Rollup (Tag Tables)

Statistics are generated automatically:

Per-second, per-minute, per-hour summaries
MIN, MAX, AVG, SUM, COUNT, SUMSQ
No manual aggregation needed

Common Misconceptions

“I need to create an index for every query”

False. Machbase automatically creates optimal indexes:

Tag tables: 3-level partitioned index
Log tables: Time-based partitioning (index optional)
Volatile tables: Red-black tree for PRIMARY KEY
Most queries work great without manual indexes

“I should create one table per sensor”

False. Use a single Tag table for all sensors:

Better performance
Easier management
Automatic optimization

“Lookup tables are slow”

Partially true. Lookup tables have:

Slower writes (hundreds/sec vs millions/sec)
Fast reads (optimized for SELECT)
Use for reference data, not high-volume inserts

“Volatile tables are just like regular tables”

False. Volatile tables are special:

100% in memory
Data lost on shutdown
Use for temporary/cache data only

Design Principles

1. Choose the Right Table Type

Don’t force a table type for the wrong use case:

Tag table for sensor data
Log table for event streams
Volatile table for real-time cache
Lookup table for reference data

2. Leverage Time-Based Features

Use Machbase’s time-aware features:

-- Good: Use DURATION
SELECT * FROM logs DURATION 1 HOUR;

-- Less optimal: Manual time filtering
SELECT * FROM logs
WHERE _arrival_time >= NOW - INTERVAL '1' HOUR;

3. Implement Data Retention

Don’t let data grow forever:

-- Daily cleanup
DELETE FROM logs EXCEPT 30 DAYS;

4. Use Rollup for Analytics

Query pre-aggregated data:

-- Fast: Use rollup
SELECT * FROM sensors WHERE rollup = hour;

-- Slow: Aggregate raw data
SELECT sensor_id, AVG(value) FROM sensors GROUP BY sensor_id;

Architecture Overview

Storage Layers

┌─────────────────────────────────────┐
│         Query Engine                │
├─────────────────────────────────────┤
│         Memory Manager              │
│  ┌──────────────┐  ┌──────────────┐│
│  │ Volatile     │  │ Query Cache  ││
│  │ Tables       │  │              ││
│  └──────────────┘  └──────────────┘│
├─────────────────────────────────────┤
│         Storage Engine              │
│  ┌──────────────┐  ┌──────────────┐│
│  │ Tag/Log      │  │ Lookup       ││
│  │ Tables       │  │ Tables       ││
│  └──────────────┘  └──────────────┘│
└─────────────────────────────────────┘

Data Flow

Sensors/Apps
     ↓
  APPEND API (bulk insert)
     ↓
  Write Buffer (memory)
     ↓
  Flush to Disk (compressed)
     ↓
  Automatic Indexing
     ↓
  Query Engine

Next Steps

Ready to dive deeper?

Start with: Understanding Time-Series Data
Then read: Table Types Overview
Finally: Indexing and Performance

Or jump to:

Common Tasks - Practical how-to guides
Table Types - Detailed reference for each type
SQL Reference - Complete SQL syntax