Rollup Rebuild Guide
Overview
If anomalous data has already been collected, you can delete the raw data and insert corrected rows. However, previously generated rollup statistics are not automatically rolled back. In that case, you must rebuild the affected rollup buckets.
In Machbase, the currently documented rebuild paths are:
- Python script
- built-in rollup only
- for standard rollup / rollup extension
- rebuilds by time range (
begintime ~ endtime) - uses the Machbase Neo REST API
- Built-in server procedure
EXEC ROLLUP_REBUILD(...)- supports built-in rollup, rollup extension, and custom rollup
- callable directly from SQL
- follows the custom rollup dependency tree for stop/rebuild/start
Limitations
The currently documented rebuild paths have the following limitations.
- The Python rebuild tool is only for built-in rollups.
- The Python rebuild tool only targets tables created with
WITH ROLLUPorWITH ROLLUP EXTENSION. - The Python rebuild tool assumes internal built-in rollup tables (
_<table>_ROLLUP_SEC,_<table>_ROLLUP_MIN,_<table>_ROLLUP_HOUR) and the fixed built-in aggregate schema. - Custom rollups cannot be recovered directly with the Python rebuild tool.
EXEC ROLLUP_REBUILD(...)is supported in Standard Edition and is not supported in Cluster Edition.EXEC ROLLUP_REBUILD(...)only supports single-tag rebuild bytable_name,tag_name,begin_time, andend_time.- The rebuild range must be handled as whole affected buckets, not partial timestamps, with delete followed by insert.
Python-Based Rollup Rebuild
Scope
The test/regress/issue-all/173/rollup_rebuild_timerange.py family of Python scripts is intended for built-in rollups only.
Assumptions:
- The source table uses
WITH ROLLUPorWITH ROLLUP EXTENSION - Internal rollup table names follow this pattern
_<table>_ROLLUP_SEC_<table>_ROLLUP_MIN_<table>_ROLLUP_HOUR
- The internal aggregate columns follow the fixed built-in rollup schema
- Delete and reinsert can be done based on
_ID
It cannot be applied directly to custom rollups.
Reasons:
- The destination table name is user-defined
- The destination schema is flexible
- Aggregate expressions vary by
AS (SELECT ...) - In rollup-on-rollup pipelines, rebuild order depends on dependencies
Example
python3 rollup_rebuild_timerange.py \
--server http://127.0.0.1:5654 \
--tablename TAG \
--tagname tag-0 \
--begintime '2000-01-01 00:00:00' \
--endtime '2000-01-01 00:00:11'Meaning:
- for
tag-0 - rebuild the second, minute, and hour rollup buckets affected by anomalous raw data
- in the range from
2000-01-01 00:00:00to2000-01-01 00:00:11
Parameters
--server- Machbase Neo REST API address
- default example:
http://127.0.0.1:5654
--tablename- TAG table name
- case-sensitive
--tagname- identifier of the anomalous data
- the first key-column value defined as
PRIMARY KEYwhen the TAG table was created
--begintime- start time of the error/deletion range
- safer to align it to the bucket boundary
--endtime- end time of the error/deletion range
- likewise safer to align it to the bucket boundary
Execution Model
The script prints the actual SQL it runs and restores built-in rollups in this order:
- source flush
- freeze if needed
- second rollup force / stop / delete / insert / start / flush
- minute rollup force / stop / delete / insert / start / flush
- hour rollup force / stop / delete / insert / start
- unfreeze if needed
If execution stops in the middle, rerunning is generally safe because it rebuilds the same buckets by delete-then-insert.
EXEC ROLLUP_REBUILD(...) Procedure
Syntax
EXEC ROLLUP_REBUILD(table_name, tag_name, begin_time, end_time);Example:
EXEC ROLLUP_REBUILD(tag,
'tag-00045',
TO_DATE('2025-09-02 01:00:00'),
TO_DATE('2025-09-02 01:00:00'));
EXEC ROLLUP_REBUILD(sys.tag,
'tag-00045',
TO_DATE('2025-09-02 01:00:00'),
TO_DATE('2025-09-02 01:00:00'));Parameters
table_name- source TAG table name
- use
schema.tableif needed
tag_name- target tag key value to rebuild
begin_time- start time of the corrected range
end_time- end time of the corrected range
Coverage
- built-in rollup
- rollup extension
- custom rollup
- rollup-on-rollup dependency pipelines
How Custom Rollup Rebuild Works
Why fixed built-in SQL is not enough
In a custom rollup, all of the following are user-defined:
- destination table name
- destination column count and types
- aggregate functions
- whether the source is a root table or another rollup destination
So a generic rebuild cannot rely on a fixed schema reinsertion pattern.
It must rerun the original custom SELECT while restricting the rebuild to the affected tag/time buckets.
Bucket Expansion
For example, suppose the anomalous source range for a 1-minute custom rollup is:
- source error time:
2026-01-27 09:30:12~2026-01-27 09:31:07
The actual rebuild target must cover the whole buckets:
- start bucket:
2026-01-27 09:30:00 - end bucket:
2026-01-27 09:31:59.999999999
Partial aggregate rows may already exist in the destination table. If you insert again without deleting first, duplicate aggregates will be created. Always delete the target buckets first, then insert again.
Manual Rebuild Procedure
If you rebuild manually without CLI support, follow this order:
- stop all affected custom rollups
- correct or reload the anomalous source data
- calculate bucket boundaries
- delete from the destination
- reinsert using the original aggregation logic from
CREATE ROLLUP ... AS (SELECT ...) - flush the destination
- if upper-level custom rollups exist, repeat from the lower level upward
- start the rollups
Manual Custom Rollup Rebuild Examples
1-minute custom rollup
The following example rebuilds the 09:30 ~ 09:31 buckets for stock_tick -> stock_rollup_1m.
STOP ROLLUP rollup_stock_1m;
DELETE FROM stock_rollup_1m
WHERE time BETWEEN TO_DATE('2026-01-27 09:30:00')
AND TO_DATE('2026-01-27 09:31:59');
INSERT INTO stock_rollup_1m
SELECT code,
DATE_TRUNC('minute', time) AS time,
SUM(price) AS sum_price,
SUM(volume) AS sum_volume,
COUNT(*) AS cnt
FROM stock_tick
WHERE time BETWEEN TO_DATE('2026-01-27 09:30:00')
AND TO_DATE('2026-01-27 09:31:59')
GROUP BY code, time;
EXEC TABLE_FLUSH('stock_rollup_1m');
START ROLLUP rollup_stock_1m;Custom rollup with FIRST/LAST
If a custom rollup uses FIRST/LAST, its helper time columns must also be recalculated.
STOP ROLLUP rollup_stock_candle_1m;
DELETE FROM stock_candle_1m
WHERE time = TO_DATE('2026-01-27 09:30:00');
INSERT INTO stock_candle_1m
SELECT code,
DATE_TRUNC('minute', time) AS time,
MIN(time) AS firsttime,
MAX(time) AS lasttime,
FIRST(time, price) AS open,
MAX(price) AS high,
MIN(price) AS low,
LAST(time, price) AS close,
SUM(volume) AS volume,
COUNT(*) AS cnt
FROM stock_tick
WHERE time BETWEEN TO_DATE('2026-01-27 09:30:00')
AND TO_DATE('2026-01-27 09:30:59')
GROUP BY code, time;
EXEC TABLE_FLUSH('stock_candle_1m');
START ROLLUP rollup_stock_candle_1m;Final reads should still merge with FIRST(firsttime, open) and LAST(lasttime, close).
rollup-on-rollup order
Example:
- stage 1:
stock_tick -> stock_rollup_1m - stage 2:
stock_rollup_1m -> stock_rollup_1h
The rebuild order must always start from the lower stage.
- stop
stock_rollup_1h - stop
stock_rollup_1m - rebuild
stock_rollup_1m - delete/rebuild
stock_rollup_1h - start
stock_rollup_1m - start
stock_rollup_1h
If you rebuild the upper stage first, it will read lower-stage results that have not yet been restored, and incorrect aggregates will be written again.
Operational Recommendations
- Confirm the affected bucket range before rebuilding custom rollups.
- Check dependencies with
v$rollupbefore and after operational changes. - If one error time range spans multiple buckets, rebuild every affected bucket.
- Custom rollup destination tables accumulate append-only results, so rebuild must use delete followed by insert.
- In rollup-on-rollup pipelines, always rebuild from lower stages first and then rebuild upper stages.
- For large built-in time-range recovery, the Python tool is more convenient. For custom or mixed extension/custom environments, use
EXEC ROLLUP_REBUILD(...).
Efficient Rollup Queries Including Recent Data
The same rule for reading recent data applies to both built-in and custom rollups.
Core pattern:
- use rollup tables for stable historical ranges
- aggregate directly from the source table for the recent range
- combine both with
UNION ALL - apply one more outer aggregation if needed
Standard 1-minute rollup example
SELECT ROLLUP('minute', 1, time) AS mtime, AVG(value)
FROM tag
WHERE name = 'TAG_0001'
AND time < DATE_TRUNC('minute', SYSDATE) - 2m
GROUP BY mtime
UNION ALL
SELECT DATE_TRUNC('minute', time) AS mtime, AVG(value)
FROM tag
WHERE name = 'TAG_0001'
AND time >= DATE_TRUNC('minute', SYSDATE) - 2m
GROUP BY mtime;Standard 20-minute aggregation example
SELECT ROLLUP('minute', 20, time) AS mtime, AVG(value)
FROM tag
WHERE name = 'TAG_0001'
AND time < DATE_BIN('minute', 20, SYSDATE, 0) - 20m
GROUP BY mtime
UNION ALL
SELECT DATE_BIN('minute', 20, time, 0) AS mtime, AVG(value)
FROM tag
WHERE name = 'TAG_0001'
AND time >= DATE_BIN('minute', 20, SYSDATE, 0) - 20m
GROUP BY mtime;Custom rollup example
The same approach applies to custom rollups. Read the destination table for stable data, and aggregate from the source for the recent range.
SELECT code, time,
SUM(sum_price) / SUM(cnt) AS avg_price
FROM (
SELECT code, time,
SUM(sum_price) AS sum_price,
SUM(cnt) AS cnt
FROM stock_rollup_1m
WHERE time < DATE_TRUNC('minute', SYSDATE) - 2m
GROUP BY code, time
UNION ALL
SELECT code,
DATE_TRUNC('minute', time) AS time,
SUM(price) AS sum_price,
COUNT(*) AS cnt
FROM stock_tick
WHERE time >= DATE_TRUNC('minute', SYSDATE) - 2m
GROUP BY code, time
)
GROUP BY code, time
ORDER BY code, time;