what is the true structure of a "data dictionary"?

When I initially began researching this term, I thought I'd just be looking up a data structure similar to what I knew: the built-in, Python datatype called a "dictionary". Then I discovered that while Guido van Rossum was creating his data dictionary, IBM was also publishing their "Dictionary of Computing" (the tenth edition of which came out in 1993), complete with it’s own description of how data dictionaries ought to function. At that time, Python was being used for system administration and rudimentary web applications. Python was likely being constructed with IBM's Data Dictionary structure in mind. The folks at the Stichting Mathematisch Centrum in the Netherlands were probably eager to build a programming language that would help end-users to interact with IBM data dictionaries.

What I eventually realized is that the Python dictionary datatype is meant to be an improvement over the existing data dictionary structures of the early 1990s. Here's why: Python's dictionary object uses a hash table implementation for storage.

Each dictionary key is used to calculate a hash-value that serves as the only answer for what's stored in a given spot. If the key for a spot is allowed to change, then suddenly there's more than one possibility for what's stored "there". That spot now refers to other places that have to be checked before the correct key-value pair can be obtained.

The kind of implementation that allows a mutable dictionary key starts to resemble a B-tree

(widely used by databases since the late 1970s). This is because B-trees grow by increasing the number of references per page. In order to find the value stored at a particular key,

you start off with an initial lookup reference and have no idea how far down the tree you’ll have to proceed

before finally reaching the leaf containing your key-value pair.

A hash-table implementation guarantees a leaf in the first reference for a given spot.

You've found the definitive answer for what's stored "there".

This is why hash-table implementations offer improved performance for look-ups.

There's only one answer for what's stored in a given spot, and you'll find it when you look there.

As Python's documentation explains, hash tables offer improved lookup performance compared to the B-tree structure used by most databases.

It kind of makes you wonder, if the dictionary datatype used by a programming language works so well

why are databases still using B-trees?

Christine NicoleDecember 4, 2022