The wal2json plugin
Convert Postgres Write-Ahead Log (WAL) changes to JSON format
The wal2json
plugin is a logical replication decoding output plugin for Postgres. It lets you convert the Write-Ahead Log (WAL) changes into JSON format, making it easier to consume and process database changes in various applications, such as data replication, auditing, event-driven services, and real-time analytics.
This guide describes the wal2json
plugin — how to enable it in Neon, configure its output, and use it to capture and process database changes in JSON format. WAL decoding is crucial for building robust data pipelines, implementing Change Data Capture (CDC) systems, and maintaining data consistency across distributed systems.
note
The wal2json
plugin is included in your Neon project and doesn't require a separate installation.
Version availability:
The wal2json
plugin is available in all Postgres versions supported by Neon. For the most up-to-date information on supported versions, please refer to the list of all extensions available in Neon.
Enable logical replication
Before using the wal2json
plugin, you need to enable logical replication for your Neon project. Navigate to the Settings page in your Neon Project Dashaboard, and select Beta from the list of options. Click Enable to enable logical replication.
note
Once enabled for a project, logical replication cannot be reverted. This action triggers a restart of all active compute endpoints in your Neon project. Any active connections will be dropped and have to reconnect.
To verify that logical replication is enabled, navigate to the SQL Editor
and verify the output of the following query:
For information about using the Neon SQL Editor, see Query with Neon's SQL Editor. For information about using the psql
client with Neon, see Connect with psql.
Create a replication slot
To start using wal2json
, you first need to create a replication slot that explcitly specifies wal2json
as the decoder plugin. You can do this by running the following query:
This creates a replication slot named test_slot
using the wal2json
plugin. Now, we can query this slot to listen for changes to any tables in the database.
wal2json
to capture changes to a table
Example - use Suppose we have a table named inventory
that stores information about products for a retail store. We want to capture changes to this table in real-time and process them using wal2json
.
Run the following query to create the inventory
table, and insert some sample data:
With logical decoding enabled, Postgres streams changes to the inventory
table to the test_slot
replication slot. Run the following query to observe the messages that have been published to it:
This query returns the changes in JSON format. Each change will be represented as a separate JSON object.
There are two rows in the query output above. The first row corresponds to the CREATE TABLE
statement that we ran earlier. Logical decoding only captures information about DML (data manipulation) events — INSERT
, UPDATE
, and DELETE
statements, hence this row is empty. The second row corresponds to the INSERT
statement that added rows to the inventory
table.
Next, we update an existing row in the inventory
table:
We can now query the test_slot
replication slot again to see the new information published as a result of the update:
This query returns a single row in JSON format, corresponding to the row updated.
Format versions: 1 vs 2
The wal2json
plugin supports two different output format versions.
The default format version is 1, which produces a JSON object per transaction. All new and old tuples are available within this single JSON object. This format is useful when you need to process entire transactions as atomic units.
Format version 2 produces a JSON object per tuple (row), with optional JSON objects for the beginning and end of each transaction. This format is more granular and can be useful when you need to process changes on a row-by-row basis. Both formats support various options to include additional properties such as transaction timestamps, schema-qualified names, data types, and transaction IDs.
To use format version 2, you need to specify it explicitly:
To illustrate, we add a couple more product entries to the inventory
table:
Now, we can query the test_slot
replication slot again to see the new information published as a result of the update:
The output of this query appears as follows. You can see that there is a separate JSON object for each row inserted.
wal2json
with tables without a primary key
Use REPLICA IDENTITY
is a table property that determines what information is written to the WAL when a row is updated or deleted. This property is crucial for wal2json
when working with tables that don't have a primary key.
REPLICA IDENTITY
has four possible settings:
DEFAULT
: Only primary key columns are logged forUPDATE
andDELETE
operations.USING INDEX
: A specified index's columns are logged forUPDATE
andDELETE
operations.FULL
: All columns are logged forUPDATE
andDELETE
operations.NOTHING
: No information is logged forUPDATE
andDELETE
operations.
Tables use the DEFAULT
setting by default. For tables without a primary key, this means no information is logged for updates and deletes. Let's create a table without a primary key and see how wal2json
behaves:
The wal2json
output for this update operation will not contain any information about the updated row due to the lack of a primary key and the DEFAULT REPLICA IDENTITY
setting.
To capture changes for tables without a primary key, we can change the REPLICA IDENTITY
to FULL
:
Now, the wal2json
output will include both the old and new values for all columns, which can be used to identify the changed row. To verify, we can query the test_slot
replication slot again:
The output of this query appears as follows:
Performance considerations
When working with wal2json
, keep the following performance considerations in mind:
- Replication slot management: Unused replication slots can prevent WAL segments from being removed, potentially causing disk space issues. Regularly monitor and clean up unused slots.
- Batch processing: Instead of processing each change individually, consider batching changes for more efficient processing.
- Resource usage: Be mindful of network bandwidth usage, especially when dealing with high-volume changes or when replicating over a wide area network. Additionally, decoding WAL to JSON can be CPU-intensive. Monitor your system's CPU usage and consider scaling your resources if needed.
Conclusion
The wal2json
plugin is a powerful tool for capturing and processing database changes in JSON format. We've seen how to enable it, configure its output, and use it in various scenarios. Whether you're implementing a data replication system, building an audit trail, or creating an event-driven architecture, wal2json
provides a flexible and efficient way to work with the Postgres Write-Ahead Log (WAL).
Resources
- wal2json GitHub Repository
- PostgreSQL Logical Decoding
- Manage logical replication in Neon - Decoder plugins
Need help?
Join our Discord Server to ask questions or see what others are doing with Neon. Users on paid plans can open a support ticket from the console. For more details, see Getting Support.