.. Automatically generated by code2rst.py
   Edit src/jsonb.c not this file!

.. currentmodule:: apsw

JSON (Javascript Object Notation)
*********************************

SQLite provides extensive functionality for working with JSON.  APSW
complements that.

What is JSON?
=============

`Javascript Object Notation <https://www.json.org/>`__ (:rfc:`8259`)
is a **TEXT** based format for representing data, encoded in `UTF-8
<https://en.wikipedia.org/wiki/UTF-8>`__.  It is deliberately
constrained in what can be represented, and as a result is very widely
supported across languages and platforms.

Types
-----

.. list-table::
    :header-rows: 1
    :widths: auto

    * - Javascript
      - Python
      - Example
    * - null
      - :class:`None`
      - .. code-block:: json

          null
    * - boolean
      - :class:`bool`
      - .. code-block:: json

          true
    * - string - surrounding double quotes with some characters backslash escaped
        such as double quote and forward slash, while Unicode codepoints can be
        backslash-u escaped allowing for the JSON text to be all ASCII.  Codepoint
        U+2603 is snowman ☃
      - :class:`str`
      - .. code-block:: json

          "hello \" world \/ \u2603"

    * - number
      - :class:`int` and :class:`float`
      - .. code-block:: json

          72
          -3.14E7
    * - array - array contents can be any JSON types
      - :class:`list`
      - .. code-block:: json

          [null, 3.2, "hello"]
    * - object - keys must be strings, but values can be any JSON types
      - :class:`dict`
      - .. code-block:: json

          {
            "description": "Orange",
            "price": 3.07,
            "varieties": ["fall",
                            {"long": "...",
                             "short": 3443}
                         ]
          }

What is not in JSON
-------------------

Versioning and extension mechanisms

  There is no version number, nor is there a way to extend the format
  to add additional functionality within the standard.

Comments

  There is deliberately no syntax for comments.

Whitespace sensitivity

  Although the examples above use indents and spacing for readability,
  JSON ignores all whitespace.  Machine generated JSON usually omits
  whitespace, while JSON intended for human consumption has whitespace
  added.

Dates and times

  There is no native date or time representation.  The most common
  approach is to use `ISO8601
  <https://en.wikipedia.org/wiki/ISO_8601>`__ formatted strings which
  look like ``2025-09-30T09:45:00Z`` for UTC and
  ``2025-09-30T01:15:00-07:00`` with a timezone.

Binary data

  Binary data can't be included in a text format.  The most common approach
  is to use `base 64 <https://en.wikipedia.org/wiki/Base64>`__ strings which
  look like ``bGlnaHQgd29yaw==``

Infinity and NaN

  There is no explicit representation for infinity or for `Not A Number
  <https://en.wikipedia.org/wiki/NaN>`__ which arise from floating point
  calculations.  For example ``1e500`` (1 with 500 zeroes) multiplied by
  itself is too large to be  represented in the most common `64 bit
  floating point <https://en.wikipedia.org/wiki/IEEE_754>`__, while
  subtracting infinity from itself produces ``NaN``.

  Because infinity and NaN can occur in calculations, many JSON libraries
  will produce and accept them despite not being in the JSON standard.
  See below for how SQLite deals with infinity and NaN.

String normalization

  Unicode allows different representations of what will appear as the same
  text.  For example an ``e`` with an accent can be represented as a single
  codepoint, or as two with the ``e`` and the combining accent separately.
  JSON makes no requirements on strings, with all implementations usually
  producing and accepting the strings as is.  You can read about the
  subject `here <https://en.wikipedia.org/wiki/Unicode_equivalence>`__.

Trailing commas

  Trailing commas will not be accepted or produced by JSON libraries.  For example
  a list like ``[1, 2, 3,]`` or an object like ``{"one": 1, "two": 2,}`` are not
  valid.

Explicit limits

  The standard does not say how long strings can be, or how many items
  can be in an array or object.  There is no limit on how many digits
  can be used in numbers nor a minimum or maximum precision.  It is
  common for implementations to have limits especially 64 bits for
  numbers.  String limits may be 1 or 2 billion characters, and arrays /
  objects be limited to a similar number of members.

  Python has a 64  bit limit on floating point numbers when using
  :class:`float` but :mod:`decimal` is unlimited, and has no limit
  on integers, strings, arrays, or objects other than available memory.

  SQLite has an upper limit of 2GB for strings, uses signed 64 bit
  integers, and standard 64 bit floating point.

  If data is large, then other representations are more appropriate.

Object (dict) key order or duplicates

  While arrays (list) are ordered, there is no specification for what
  order object keys are in, or that duplicates are not allowed.  This
  usually doesn't matter, but there are security attacks where one
  component may use the first occurrence of a duplicate key, while
  another component uses the last occurrence.  For example the SQLite
  function extracting values will use the first occurrence of a key,
  while a dict created from the object will use the last occurrence.

  The Python :mod:`json` module, and APSW both let you see objects
  as lists of keys and values so you can do your own validation or
  other processing.

JSON5
=====

`JSON5 <https://json5.org/>`__ is a superset of JSON intended to be
more human readable and writable.  SQLite will accept JSON5 encoded
text, but will never produce it.  While SQLite parses JSON5, you
can't get back JSON5 output from a JSON5 input.

For example JSON5 allows comments, hexadecimal numbers, trailing
commas, infinity and NaN, and omitting some quoting,

Using JSON
==========

Python
------

The standard library :mod:`json` module provides all necessary
functionality, including turning Python objects into JSON text, and
JSON text into Python objects.  You can read and write JSON text, or a
:term:`file object`.

It deviates from the standard:

* ``Infinity`` and ``NaN`` are produced and consumed by default,
  although there is a  keyword argument to turn it off
* When producing JSON objects, keys that are numbers, None, or
  boolean are turned into their corresponding JSON text representation.
  When reading an object back, the reverse transformation is not
  done since there is no way to know if that is intended,
* Various corner cases in Unicode / UTF8 are accepted such as
  unpaired surrogates and UTF8 encoded surrogates.  This was done
  because other implementations at the time could produce this
  kind of encoding.  Attempting to encode the resulting strings as UTF-8
  again will result in exceptions.

You can see a `full list of JSON issues
<https://github.com/orgs/python/projects/6>`__.

SQLite
------

SQLite has over `30 functions <https://sqlite.org/json1.html>`__ for
consuming, extracting, iterating, and producing JSON.  You will need
to ensure that what you get back is what is intended.  You can usually
get back the JSON text representation of values, or the SQLite
value.  For example a SQLite string is the same as a Python
string, while the JSON text representation includes double quotes
around it and various quoting inside. (:ref:`Example
<example_json_functions>`)

You can store JSON text directly in the database, but there is no way
to differentiate it from any other text value.  For example the number
``2`` in JSON is text ``2``.  The `json_valid
<https://sqlite.org/json1.html#jvalid>`__ function may help - for
example as a `CHECK constraint
<https://sqlite.org/lang_createtable.html#check_constraints>`__ on a
column.

Infinity and NaN
----------------

SQLite accepts infinity but represents it as the floating point value
``9e999`` and accepts NaN representing it as ``null`` (None).  Unfortunately
``9e999`` is a valid value for :class:`decimal.Decimal` as well as
``numpy.float128``, so you won't be able to tell if infinity was the
original value

.. _jsonb:

JSONB
=====

SQLite has a binary format for representing JSON - named JSONB.  It is
`specified here <https://sqlite.org/jsonb.html>`__.  It is
significantly faster to use because JSON text requires finding
matching quotes around strings, square brackets around arrays, curly
braces around objects, and ensuring numeric values are valid.  JSONB
has already done all that processing so accessing and extracting
members is a lot quicker.  It also saves some space.

In most cases using SQLite JSON text functions results in SQLite doing an
internal conversion to JSONB (which is cached) and then operating on that.
JSONB internally stores values as a binary tag and length, then the UTF8 text
so producing JSON text again is quick.

You can store JSONB to the database, and again can use the `json_valid
<https://sqlite.org/json1.html#jvalid>`__ function as `CHECK constraint
<https://sqlite.org/lang_createtable.html#check_constraints>`__ with the
value ``8``.

.. note::

  JSONB does not have a version number or any header explicitly
  identifying binary data as JSONB.  There is no checksum or similar
  validation.  As an example a single byte whose value is 0 through 12
  is valid JSONB.

  As byte sequences get longer, the likelihood they are valid JSONB
  decreases.  This table lists what proportion of all byte sequences
  of each length are also valid JSONB.

  .. list-table:: Valid JSONB sequences
    :header-rows: 1
    :widths: auto

    * - Sequence length
      - Proportion valid JSONB
    * - 1 byte
      - 3.52%
    * - 2 bytes
      - 0.71%
    * - 3 bytes
      - 0.35%
    * - 4 bytes
      - 0.18%
    * - 5 bytes
      - 0.10%
    * - 6 bytes
      - 0.05%
    * - 7 bytes
      - 0.03%
    * - 8 bytes
      - 0.02%
    * - 9 bytes
      - 0.01%
    * - 10+ bytes
      - very low

.. _apsw_jsonb:

APSW
----

APSW provides 2 functions for working directly with JSONB, and a
validation function.  This is for performance reasons so that there
is no need for an intermediate step representing objects as JSON text.
The validation function is stricter than SQLite's equivalent to avoid
false positives.

Performance testing was done using SQLite's randomjson code to create
a large object with many nested values- your objects will be
different.

:func:`~apsw.jsonb_encode`

  Converts a Python object directly to JSONB.  The alternative is
  two steps using :mod:`json` to convert to JSON text and then
  SQLite's internal JSON text to JSONB.

  .. list-table:: Test results (CPU time)
    :widths: auto

    * - 0.13 seconds
      - APSW Python object direct to JSONB
    * - 1.20 seconds
      - :mod:`json` same Python object to JSON text
    * - 0.80 seconds
      - SQLite that JSON text to JSONB

  The same parameters as :func:`json.dumps` are available, with more
  providing control over how non-string object keys are converted,
  type matching, and direct conversion to JSONB types can be done.

:func:`~apsw.jsonb_decode`

  Converts JSONB directly back to a Python object.  The alternative
  is two steps using SQLite's internal JSONB to JSON text and then
  :mod:`json` to convert the JSON text to a Python object.

  .. list-table:: Test results (CPU time)
    :widths: auto

    * - 0.48 seconds
      - APSW JSONB direct to Python object
    * - 0.22 seconds
      - SQLite same JSONB to JSON text
    * - 1.35 seconds
      - :mod:`json` that JSON text to Python object

  The same parameters as :func:`json.loads` are available, with an additional
  hook for arrays (lists).

:func:`~apsw.jsonb_detect`

  Returns a boolean if some binary data is valid JSONB.
  If this returns ``True`` then SQLite will always produce valid JSON from the
  JSONB.

  SQLite's `json_valid <https://sqlite.org/json1.html#jvalid>`__  only
  checks the various internal type and length fields are consistent
  and items seem reasonable.  It does not check all corner cases, or
  the UTF8 encoding, and so can produce invalid JSON even if
  json_valid said it was valid JSONB.

Notes
=====

Because SQLite has a 2GB limit on text or blobs (binary data), it
can't work with individual JSON text or JSONB data over that size.

.. _jsontype:

JSON as a SQLite value type
===========================

You can make SQLite automatically support JSON as though it was a
natively supported type.  :ref:`Example code <example_json_quick>`
that does these steps.

:ref:`Store JSONB <jsonb>`

  SQLite's binary JSON representation is stored as a binary blob in the
  database.  This is necessary because JSON text can't easily be
  distinguished from other text, while a blob is far less
  ambiguous.

  SQLite often operates on JSONB internally when using
  the `JSON functions <https://sqlite.org/json1.html>`__, and its
  ``json`` function can turn JSONB into JSON text format if needed.

  SQLite's functions provide full access to all the values inside JSON
  and JSONB for reading, iterating, qnd modifying.

Convert bindings

  The :attr:`~Cursor.convert_binding` callback for SQLite unknown types
  can :func:`encode them <jsonb_encode>` as JSONB.

Convert JSONB

  The :attr:`~Cursor.convert_jsonb` callback is called when a blob would
  be returned and is also valid JSONB.  You can :func:`decode it <jsonb_decode>`
  or return the blob.  The cursor is provided so you can examine
  the :attr:`~Cursor.description` to help decide.

JSONB API
=========

.. method:: jsonb_decode(data: collections.abc.Buffer, *,  object_pairs_hook: typing.Callable[[list[tuple[str, JSONBTypes | typing.Any]]], typing.Any] | None = None,  object_hook: typing.Callable[[dict[str, JSONBTypes | typing.Any]], typing.Any] | None = None,    array_hook: typing.Callable[[list[JSONBTypes | typing.Any]], typing.Any] | None = None,    parse_int: typing.Callable[[str], typing.Any] | None = None,    parse_float: typing.Callable[[str], typing.Any] | None = None,) -> typing.Any

    Decodes JSONB binary data into a Python object.  It is like :func:`json.loads`
    but operating on JSONB binary source instead of a JSON text source.

    :param data: Binary data to decode
    :param object_pairs_hook: Called after a JSON object has been
        decoded with a list of tuples, each consisting of a
        :class:`str` and corresponding value, and should return a
        replacement value to use instead.
    :param object_hook: Called after a JSON object has been decoded
        into a Python :class:`dict` and should return a replacement
        value to use instead.
    :param array_hook: Called after a JSON array has been decoded into
        a list, and should return a replacement value to use instead.
    :param parse_int: Called with a :class:`str` of the integer, and
        should return a value to use.  The default is :class:`int`.
        If the integer is hexadecimal then it will be called with a
        second parameter of 16.
    :param parse_float: Called with a :class:`str` of the float, and
        should return a value to use.  The default is :class:`float`.

    Only one of ``object_hook`` or ``object_pairs_hook`` can be
    provided.  ``object_pairs_hook`` is useful when you want something
    other than a dict, care about the order of keys, want to convert
    them first (eg case, numbers, normalization), want to handle duplicate
    keys etc.

    The array, int, and float hooks let you use alternate implementations.
    For example if you are using `numpy
    <https://numpy.org/doc/stable/user/basics.types.html>`__ then you
    could use numpy arrays instead of lists, or numpy's float128 to get
    higher precision floating numbers with greater exponent range than the
    builtin float type.

    If you use :class:`types.MappingProxyType` as ``object_hook`` and
    :class:`tuple` as ``array_hook`` then the overall returned value
    will be immutable (read only).

    .. note::

      The data is always validated during decode.  There is no need to
      separately call :func:`~apsw.jsonb_detect`.

.. method:: jsonb_detect(data: collections.abc.Buffer) -> bool

    Returns ``True`` if data is valid JSONB, otherwise ``False``.  If this returns
    ``True`` then SQLite will produce valid JSON from it.

    SQLite's json_valid only checks the various internal type and length fields are consistent
    and items seem reasonable.  It does not check all corner cases, or the UTF8
    encoding, and so can produce invalid JSON even if json_valid said it was valid JSONB.

    .. note::

      :func:`~apsw.jsonb_decode` always validates the data as it decodes, so there is no
      need to call this function separately.  This function is useful for determining if
      some data is valid, and not some other binary format such as an image.

.. method:: jsonb_encode(obj: typing.Any, *, skipkeys: bool = False, sort_keys: bool = False, check_circular: bool = True, exact_types: bool = False, default: typing.Callable[[typing.Any], JSONBTypes | collections.abc.Buffer] | None = None, default_key: typing.Callable[[typing.Any], str] | None = None, allow_nan:bool = True) -> bytes

    Encodes a Python object as JSONB.  It is like :func:`json.dumps` except it produces
    JSONB bytes instead of JSON text.

    :param obj: Object to encode
    :param skipkeys: If ``True`` and a non-string dict key is
       encountered then it is skipped.  Otherwise :exc:`ValueError`
       is raised.  Default ``False``.  Like :func:`json.dumps` keys
       that are bool, int, float, and None are always converted to
       string.
    :param sort_keys: If ``True`` then objects (dict) will be output
       with the keys sorted.  This produces deterministic output.
       Default ``False``.
    :param check_circular: Detects if containers contain themselves
       (even indirectly) and raises :exc:`ValueError`.  If ``False``
       and there is a circular reference, you eventually get
       :exc:`RecursionError` (or run out of memory or similar).
    :param default: Called if an object can't be encoded, and should
       return an object that can be encoded.  If not provided a
       :exc:`TypeError` is raised.

       It can also return binary data in JSONB format.  For example
       :mod:`decimal` values can be encoded as a full precision JSONB
       float.  :func:`apsw.ext.make_jsonb` can be used.
    :param default_key: Objects (dict) must have string keys.  If a
       non-string key is encountered, it is skipped if ``skipkeys``
       is ``True``.  Otherwise this is called.  If not supplied the
       default matches the standard library :mod:`json` which
       converts None, bool, int and float to their string JSON
       equivalents and uses those.  This callback is useful if
       you want to raise an exception, or use a different way
       of generating the key string.
    :param allow_nan: If ``True`` (default) then following SQLite practise,
        infinity is converted to float ``9e999`` and NaN is converted
        to ``None``.  If ``False`` a :exc:`ValueError` is raised.
    :param exact_types: By default subclasses of int, float, list (including
        tuple), dict (including :class:`collections.abc.Mapping`), and
        :class:`str` are converted the same as the parent class.  This
        is usually what you want.  However sometimes you are using a
        subclass and want them converted by the ``default`` function
        with an example being :class:`enum.IntEnum`.  If this parameter
        is ``True`` then only the exact types are directly converted
        and subclasses will be passed to ``default`` or ``default_key``.

    You will get a :exc:`~apsw.TooBigError` if the resulting JSONB
    will exceed 2GB because SQLite can't handle it.

