PostgreSQL JSON versus JSONB


What are the differences between the two JSON implementations in PostgreSQL

Published on September 12, 2022 by Stefanie Janine Stölting

PostgreSQL postgres JSON JSONB

5 min READ

History

PostgreSQL supports two types of JSON implementations: JSON and JSONB. The first implementation of JSON has been released in PostgreSQL 9.2 in 2012. JSONB has been added two years later in PostgreSQL 9.4.

There are still two JSON standards, that differ. One is RFC7159 by Douglas Crockford. The other one the ECMA 404.

PostgreSQL is using the RFC7159. Compared to the ANSI SQL standard, which is often not really human readable, this standard is short and easy to understand, it has only 15 pages.

In March 2017 the ISO published a new standard, ISO/IEC TR 19075-6:2017, that defines how to implement JSON in SQL databases. This document is only available for money.
A while later, the ISO published an additional document, which explains the SQL/JSON standard. The document ISO/IEC TR 19075-6 is available for free. It is much easier to understand the ISO/IEC TR 19075-6:2017 standard with this additional documentation.

Another part of the ISO/IEC TR 19075-6:2017 standard is the SQL/JSON Path Language, which has been implemented in PostgreSQL 12 in 2019.
It offers more possibilities to query JSON objects.

JSON

JSON is implemented very simple. Every valid JSON object is accepted. The maximum amount of data for each column is 1 GB by using TOAST.

JSONB

JSONB is a more complex implementation. It is ordering the paths and values inside. Also duplicates will be removed, as according to RFC7159 it is only possible to get the value of the last appearance of a key. Though JSONB only handles the last one and removes the previous keys with the same name. Also all line breaks and all unnecessary spaces will be removed in JSONB.

-- JSON
SELECT '{
  "bar": "baz",
  "num": "one",
  "num":"two",
  "active":false
 }'::json;

RESULT: {“bar”: “baz”, “num”: “one”, “num”: “two”, “active”:false}

-- JSONB
SELECT '{
 "bar": "baz",
 "num": "one",
 "num":"two",
 "active":false
}'::jsonb;

RESULT: {“bar”: “baz”, “num”: “two”, “active”: false}

As shown in the example above, there is only one occurrence of the key “num” in JSONB, while both are returned with JSON.
This behavior enables JSONB to be indexed, as every path is unique in its structure. The index type that can be used here is GIN.

The maximum amount of data for each column is 1 GB by using TOAST.

Hints

Why JSON?

PostgreSQL also supports a XML datatype and including several functions, but JSON has the advantage, that it does take care of the used code page. That is already part of the standard and it is UTF. PostgreSQL has implemented both JSON datatypes as UTF-8.

When Not To Use JSON

Whenever you are writing the same structure of data into a JSON/JSONB column, think of storing the data in relational tables. It is much easier to check for fields, that you always want to have. It is also faster, than parsing a JSON object to ensure, it has a certain key with a valid value.
For example if you expect a value of being of type integer, or in a range. That is more easily checked with a dedicated datatype, than a JSON object.

Choosing The Right JSON Type

A lot of functions can deal with both datatypes. Some are implemented with a trailing b for the usage with JSONB.

If you plan to do nothing inside the database except inserting and reading the whole JSON object, use JSON. It is also a good solution, if you need to archive JSON data, as it is stored as is, as long as it is valid JSON.

But if you want to do some SQL magic with JSON objects, then JSONB is the datatype of choice. It offers a lot more possibilities and is much faster when you index the JSONB column with a GIN index.

Human Readable JSONB

There are no line breaks and spaces for indentation stored in JSONB datatype, while the JSON datatype returns the data as stored, For us humans it is a bit hard to read the results from JSONB as it is just one line.
In PostgreSQL 9.5 the function jsonb_pretty has been implemented. This function enables us humans now, to read and understand the content of JSON objects coming from a JSONB column.

SELECT
  jsonb_pretty (
    '[{"f1":1,"f2":null}, 2]'::jsonb
  );
/*
RESULT:
[
    {
        "f1": 1,
        "f2": null
    },
    2
]
*/

The function just adds line breaks and indentation.

Lesser Known Stuff

Time / Date / Datetime

JSON does not support any date or time datatypes. Only the following types are supported:

  • string
  • number
  • object
  • array
  • true
  • false
  • null

PostgreSQL is dynamically associating which datatype it will use here. The standard in JSON is here, to handle date/time/datetime manually as it is a string, hence several JSON libraries implemented a handler for this datatypes.

Null-Terminated Strings

Recently in a chat on Telegram about which datatype to use to store JSON data, a guy mentioned, that both PostgreSQL JSON types don’t store null-terminated strings, they are returning an error message, if you try to.

But in my humble opinion, that does not matter, as the kind of characters that JSON can handle is very well defined in the JSON Standard. As these types are not mentioned, they are not standard compliant. Some languages have implemented using null-terminated_strings nevertheless. I still would avoid using them because of the standard.

SELECT '{
  "a_json_object": "with_a_\u0000_byte"
 }'::jsonb;

RESULT: ERROR: unsupported Unicode escape sequence

SELECT E'{"
  a_json_object": "with_a_\u0000_byte"
 }'::json;

RESULT: invalid Unicode escape value at or near “\u0000”

Though, PostgreSQL does everything correct and has implemented the JSON standard as defined. Returning an error message pointing exactly to the problem is the correct and best solution.



Author: Stefanie Janine Stölting