An overview over BLOBs in PostgreSQL
Published on December 09, 2024 by Stefanie Janine Stölting
PostgreSQL timetravel partitions
20 min READ
Until PostgreSQL 6.2 had time travel implemented in its core. It has been removed in PostgreSQL 6.3.
It is possible to implement time travel without any extension into PostgreSQL.
I gave a talk about it at PGDay/MED 2024, the idea of the talk started with writing this blog post.
The slides are available on GitLab.
There is a simple solution, but that might become slow with more data and extensive updated data. Hans-Jürgen Schönig has written about it in his blog post in 2019 about AS OF-queries.
The only problem is, that the trigger function is throwing errors on PostgreSQL 17. I have not tested it with older PostgreSQL versions.
But this is solvable by chaning the function.
The idea to this blog post started with a talk by Ryan Booz, Postgres Table Partitioning.
For the start weU need a table, that will held the unique numeric identifiers as they need to be unique over all partitions. That feature is not supported by PostgreSQL by now.
Therefore we create a table which will only contain the identifiers. The table and it’s primary key will be small and PostgreSQL will not face any problems holding it in memory.
Creating the primary key table:
-- Creates the primary key table which will also keep the creation timestamp
CREATE TABLE timetravel_pk (
timetravelid BIGINT GENERATED ALWAYS AS IDENTITY,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT clock_timestamp(),
CONSTRAINT timetravle_pk_pk PRIMARY KEY (timetravelid)
);
The table with the time travel data will later reference to this table in a foreign key.
To only store the time when the id has been created once, the timestamp holding that data is located in this table.
The data table will be partitioned by range. For this example it will be by year.
There is one thing, that partitions have to exist before data can be inserted. To handle that case we create a trigger, which will be fired before insert.
The same trigger will be used to create the id column before the data is inserted into the partitioned table.
Creating the data table:
-- Creates the btree GIST index which will be used for the column changed
CREATE EXTENSION IF NOT EXISTS btree_gist;
-- Creates the partitioned table which will store all versions of records
CREATE TABLE timetravel (
timetravelid BIGINT NOT NULL,
changed TSTZRANGE NOT NULL DEFAULT tstzrange(clock_timestamp(), 'INFINITY', '[)'),
data_text TEXT,
data_json JSONB,
deleted BOOLEAN NOT NULL DEFAULT FALSE,
CONSTRAINT timetravelid_fk FOREIGN KEY (timetravelid) REFERENCES timetravel_pk(timetravelid)
) PARTITION BY RANGE (lower(changed))
;
-- Creating indexes
CREATE INDEX timetravel_changed_idx
ON timetravel
USING gist
(changed)
;
CREATE INDEX timetravel_timetravelid_fk_idx
ON timetravel
USING btree
(timetravelid)
;
CREATE INDEX timetravel_not_deleted_idx
ON timetravel
USING btree
(deleted)
WHERE NOT deleted
;
Keep in mind, that PostgreSQL does not create indexes on foreign keys automatically. That is a trap where lots of people are falling into. But it has advantages, as PostgreSQL supports a bunch of index types and you can choose the appropriate index method.
To partition the table automatically we need to know, for which years the partitions do already exist. And when they are not existing, new partitions have to be created.
The trigger will be fired before insert and before update.
There is an alternative solution, but that would involve an extension to create cron jobs inside the database, I prefer pg_cron.
Then the function has to be created as simple procedure, there is no need to return data. A cron job inside the database will execute the procedure before the current time period ends.
That is obviosly much faster, than the method with triggers, but involves an extension.
Createing the table for to store the periods
CREATE TABLE timetravel_part_vals (
part_year SMALLINT NOT NULL,
start_value TIMESTAMP WITH TIME ZONE NOT NULL,
end_value TIMESTAMP WITH TIME ZONE NOT NULL,
CONSTRAINT timetravel_part_vals_pk PRIMARY KEY (part_year)
);
I would usually handle the automatically creation of partitions by time in a cron job, for example with pg_cron.
That is much more effective than handling it every time a record is inserted, updated, or deleted.
Function to handle partitions:
CREATE OR REPLACE FUNCTION timetravel_partition (in_changed TIMESTAMPTZ)
RETURNS void
LANGUAGE PLPGSQL
AS
$$
DECLARE
query TEXT;
in_range BOOLEAN = FALSE;
year_p SMALLINT;
start_v TIMESTAMP WITH TIME ZONE;
end_v TIMESTAMP WITH TIME ZONE;
part_name TEXT;
BEGIN
-- Check the changed date to be in an existing partition
EXECUTE 'SELECT count(*) > 0
FROM timetravel_part_vals
WHERE part_year = extract(year from $1)'
INTO in_range
USING in_changed;
IF (NOT in_range) THEN
-- Update the range data
EXECUTE 'INSERT INTO timetravel_part_vals (part_year, start_value, end_value)
SELECT extract(year from $1),
date_trunc(''year'', $1),
date_trunc(''year'', $1) + INTERVAL ''1 year''
RETURNING *'
INTO year_p, start_v, end_v
USING in_changed;
-- Create a new partition
part_name := 'timetravel_' || year_p::TEXT;
EXECUTE 'CREATE TABLE ' || part_name ||
' PARTITION OF timetravel FOR VALUES FROM (''' || start_v::text || ''') TO (''' || end_v::text || ''')';
RAISE NOTICE 'Partition % created.', part_name;
END IF;
END;
$$
-- Create the partition for the current year
SELECT timetravel_partition (now());
The trigger function for inserts and updates will always overwrite the incoming value of the change timestamp, that way the data can not be manipulated.
Creating the trigger for insert and updats
CREATE OR REPLACE FUNCTION trigger_timetravel_in_upd ()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
BEGIN
-- Setting default values
NEW.deleted = false;
NEW.changed = tstzrange(clock_timestamp(), 'INFINITY', '[)');
-- On UPDATE a new record is inserted
CASE WHEN TG_OP = 'UPDATE' THEN
IF NEW.timetravelid <> OLD.timetravelid THEN
RAISE EXCEPTION 'The identity column timetravelid can not be changed!';
END IF;
IF NOT OLD.deleted THEN
IF upper(OLD.changed) = 'infinity' THEN
-- Updated the latest version of a record
INSERT INTO timetravel (timetravelid, data_text, data_json, changed)
SELECT NEW.timetravelid
, NEW.data_text
, NEW.data_json
, NEW.changed
;
-- Only the range for the old record is changed, it has an end now
NEW.data_text = OLD.data_text;
NEW.data_json = OLD.data_json;
NEW.changed = tstzrange(lower(OLD.changed), lower(NEW.changed));
RETURN NEW;
ELSE
-- It is not the newest version, therefore there is nothing to do
RETURN NULL;
END IF;
ELSE
-- An already deleted record cannot be changed
RETURN NULL;
END IF;
-- The new record needs its id created by inserting into the pk table
WHEN TG_OP = 'INSERT' THEN
INSERT INTO timetravel_pk (created)
VALUES (clock_timestamp())
RETURNING timetravelid
INTO NEW.timetravelid;
RETURN NEW;
ELSE
RETURN NULL;
END CASE;
END;
$$
-- Attach the trigger function for inserts
CREATE TRIGGER timetravel_insert
BEFORE INSERT
ON timetravel
FOR EACH ROW
WHEN (pg_trigger_depth() < 1)
EXECUTE PROCEDURE trigger_timetravel_in_upd()
;
-- Attach the trigger function for updates
CREATE TRIGGER timetravel_update
BEFORE UPDATE
ON timetravel
FOR EACH ROW
EXECUTE PROCEDURE trigger_timetravel_in_upd()
;
The last thing we have to take care of is deleted records. We can only delete the data logically, otherwise time travel would not be possible.
You might have noticed, that the table timetravel
key does already have a column deleted
. It is by default false and can not be changed on inserts or updates, the trigger function does handle that for us. But now we need a trigger that will be fired before a record gets deleted and the job of the trigger is to cancel the deletion and to insert a new record where deleted
is true.
You might have noticed, that the INSERT
triggers have an additional condition after FOR EACH ROW
, WHEN (pg_trigger_depth() < 1)
does handle, that the trigger is only executed, when it is not executed from within another trigger. That is documented in FUNCTIONS-INFO-SESSION-TABLE.
Without that the trigger timetravel_insert
would overwrite the deleted
value and the trigger timetravel_insert_pk
would overwrite the primary key.
Delete trigger:
-- The trigger inserts two records, one with the old data but with an end
-- timestamp in column changed, one with deleted = true but end timestamp
-- is INFINITY
CREATE OR REPLACE FUNCTION trigger_timetravel_del ()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
DECLARE
ts timestamp with time zone;
BEGIN
-- When a record has already been deleted, an error is raised and no data is changed
IF OLD.deleted THEN
RAISE EXCEPTION 'Timetravel record with the timetravelid % is already deleted.', OLD.timetravelid;
END IF;
IF upper(OLD.changed) = 'infinity' THEN
-- Only the latest version has to be taken care off
ts = clock_timestamp();
INSERT INTO timetravel (timetravelid, changed, data_text, data_json, deleted)
VALUES (
OLD.timetravelid,
tstzrange(ts, 'INFINITY'),
OLD.data_text,
OLD.data_json,
true
),
(
OLD.timetravelid,
tstzrange(lower(OLD.changed), ts),
OLD.data_text,
OLD.data_json,
false
);
RETURN OLD;
ELSE
-- All other records stay as they are
RETURN NULL;
END IF;
END;
$$
-- Attach the trigger function for deletions
CREATE TRIGGER timetravel_delete
BEFORE DELETE
ON timetravel
FOR EACH ROW
EXECUTE PROCEDURE trigger_timetravel_del()
;
Now we do have everything in place to have the data automatically partitioned, data also can not be deleted, and we can create some views to access the data at certain points in time.
To play with the time traveling, we need some data in the timetravel
table. The statement will insert 1,000,000 records.
Inserting data:
-- Insert records
INSERT INTO timetravel (data_text, data_json)
SELECT substr(md5(random()::text), 1, 25) AS data_text
, to_jsonb(substr(md5(random()::text), 1, 25)) AS data_json
FROM generate_series(1, 100000) s(i)
;
The update will change some records by random ids. You may run it several times to generate several versions of each record.
Updating data:
-- Update existing records
WITH t1 AS
(
SELECT floor(random() * (100000-1+1) + 1)::bigint AS timetravelid
, substr(md5(random()::text), 1, 25) AS data_text
, to_jsonb(substr(md5(random()::text), 1, 25)) AS data_json
FROM generate_series(1, 100000) s(i)
)
, t2 AS
(
SELECT DISTINCT ON (timetravelid) timetravelid
, data_text
, data_json
FROM t1
)
UPDATE timetravel SET
data_text = t2.data_text,
data_json = t2.data_json
FROM t2
WHERE timetravel.timetravelid = t2.timetravelid
;
;
** Deleting data:**
-- Delete some records
DELETE FROM timetravel WHERE timetravelid IN (99, 654, 5698);
Checking the existing amount of data:
-- Table statistics
WITH rec_v AS
(
SELECT t.timetravelid
, ROW_NUMBER() OVER (PARTITION BY t.timetravelid ORDER BY lower(t.changed)) AS rec_version
FROM timetravel AS t
)
SELECT count(t.timetravelid) AS recordcount
, min(t.timetravelid) AS min_primary_key
, max(t.timetravelid) AS max_primary_key
, min(rec_v.rec_version) AS min_version_number
, max(rec_v.rec_version) AS max_version_number
, count(t.timetravelid) FILTER (WHERE t.deleted) AS deleted_records
FROM timetravel AS t
INNER JOIN rec_v
ON t.timetravelid = rec_v.timetravelid
;
The results after having updated four times:
recordcount | min_primary_key | max_primary_key | min_version_number | max_version_number |
---|---|---|---|---|
4347004 | 1 | 100000 | 1 | 4 |
The recordcount might differ as the column timetravelid is randomly generated.
Getting data valid on a certain point in time:
-- Result for a certain point in time
SELECT t.timetravelid
, ROW_NUMBER() OVER (PARTITION BY t.timetravelid ORDER BY lower(t.changed)) AS rec_version
, t.data_text
, t.data_json
, pk.created
, t.changed
, lower(t.changed) AS valid_from
, upper(t.changed) AS valid_until
, t.deleted
FROM timetravel_pk AS pk
INNER JOIN timetravel AS t
ON pk.timetravelid = t.timetravelid
WHERE '2024-11-16 09:45:06.821 +0100'::timestamptz <@ t.changed
ORDER BY t.timetravelid
, lower(t.changed)
;
You would likely have to adjust the WHERE clause to have a timestamp that does exist in your table.
We can use the previous statement to create a view, we only remove the WHERE clause and the ORDER BY clause.
Create a view:
CREATE OR REPLACE VIEW timetravel_v AS
WITH rec_v AS
(
SELECT t.timetravelid
, ROW_NUMBER() OVER (PARTITION BY t.timetravelid ORDER BY lower(t.changed)) AS rec_version
FROM timetravel AS t
)
SELECT DISTINCT ON (t.timetravelid)
t.timetravelid
, rec_v.rec_version
, t.data_text
, t.data_json
, pk.created
, t.changed
, lower(t.changed) AS valid_from
, upper(t.changed) AS valid_until
FROM timetravel_pk AS pk
INNER JOIN timetravel AS t
ON pk.timetravelid = t.timetravelid
INNER JOIN rec_v
ON pk.timetravelid = rec_v.timetravelid
WHERE NOT deleted
AND upper(t.changed) = 'infinity'::TIMESTAMPTZ
;
There is still one problem with that view: Due to having calculated columns in it, it would not be possible to insert new data or update existing data.
To achieve this we can create more trigger functions, which will be fired with INSTEAD OF INSERT
and INSTEAD OF UPDATE
as triggers attached to the view.
-- One trigger to bind them all
CREATE OR REPLACE FUNCTION trigger_timetravel_view_in_upd_del ()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
BEGIN
CASE
WHEN TG_OP = 'INSERT' THEN
EXECUTE 'INSERT INTO timetravel (data_text, data_json) VALUES ($1, $2)'
USING NEW.data_text, NEW.data_json;
WHEN TG_OP = 'UPDATE' THEN
EXECUTE 'UPDATE timetravel SET data_text = $1, data_json = $2 WHERE timetravelid = $3'
USING NEW.data_text, NEW.data_json, OLD.timetravelid;
WHEN TG_OP = 'DELETE' THEN
EXECUTE 'DELETE FROM timetravel WHERE timetravelid = $1'
USING OLD.timetravelid;
ELSE
RAISE EXCEPTION 'Operation not supported.';
END CASE;
RETURN NEW;
END;
$$
-- Attach the trigger function to the view
CREATE OR REPLACE TRIGGER timetravel_v_trigger
INSTEAD OF INSERT OR UPDATE OR DELETE
ON timetravel_v
FOR EACH ROW
EXECUTE PROCEDURE trigger_timetravel_view_in_upd_del ()
;
The SQL standard of 2011 has added historical records with the key words AS OF
. This is currently not implemented into PostgreSQL up to version 17. It is labeled under Temporal features
, the link goes to a publication about it.
In the commitfests for PostgreSQL 17 a patch was introduced to extend PostgreSQL to the standard. That patch did not make it into PostgreSQL 17, but may probably land in the next major version.
The SQL script is a available for download to spare you to copy the code.
I hove you enjoyed this long blog post.