rb.db

Download: rb.db.xz (18.64 MiB, 2024-09-17 07:55) [rb.db.sha256] (previous versions)

The main goal of rb.db is to provide original, unmodified tables from Rebrickable Downloads in a form of ready-to-use SQLite database file, and build it on schedule, so the latest release provides always up-to-date version of the database.

Releases are created automatically once a day, but only if there were actual changes since the last release.

Retention policy:

Database Schema

For Rebrickable tables the main rule is to import them as-is, without adding/removing/modifying any table/column names or data (except for the purpose of data types conversion). Schema only enforces several constraints to ensure the database integrity and the relevance of this documentation:

CSV format, in which original Rebrickable tables are provided, cannot include types information for the stored data. Therefore column data types, used by the schema, are determined basing on the column content and SQLite3 specifics:

Schema of the Rebrickable tables is described in Rebrickable Tables section. rb.db also includes few custom tables, non-trivially generated from them, and some handy views. They are described in Custom Tables section.

Almost all columns in the tables cannot be NULL. Thus this is not mentioned in the columns description, and only for nullable columns there will be explicit note about this.

Note about CSV import in SQLite3

Original Rebrickable tables are provided in CSV format. SQLite can import tables from CSV files directly. However it unconditionally treats empty values as empty strings (details) whereas in context of Rebrickable tables these values have to become NULL in database.

For example, themes.parent_id foreign key constraint would fail at all with an empty string, because it expects either an existing themes.id value or NULL.

This is why the import scripts import tables directly instead of relying on .import SQLite3 command.

Changelog

Current schema version is 7. List of changes for each schema version:

  1. added table rb_db_lov
  2. renamed column color_properties.color_id to color_properties.id as this is complementary table
  3. added table part_rels_extra
  4. changed column types to integer (0/1) for color.is_trans and inventory_parts.is_spare
  5. added column color_properties.is_grayscale
  6. added views part_color_stats, part_stats, part_color_images, part_images
  7. added view color_stats and merged views part_[color_]images with part_[color_]stats

Diagram

Database diagram

Rebrickable Tables

colors

This table contains the part colors.

Columns: id (integer, primary key), name (text), rgb (text), is_trans (integer).

id is a number, unique for each color. Referenced by inventory_parts.color_id, elements.color_id, color_properties.id, similar_color_ids.ref_id, similar_color_ids.id.

name is the color name on Rebrickable.

rgb is RGB color in a form of HEX triplet, 6 hexadecimal digits without prefix.

is_trans is a 0/1 flag indicating if color is transparent.

Example:

$ sqlite3 rb.db "select * from colors group by is_trans"
-1|[Unknown]|0033B2|0
32|Trans-Black IR Lens|635F52|1

themes

This table contains the set themes.

Columns: id (integer, primary key), name (text), parent_id (integer, nullable).

id is a number, unique for each theme. Referenced by sets.theme_id and even by this table in parent_id column.

name is the theme name on Rebrickable.

parent_id is the parent theme id for sub-themes and NULL otherwise.

As for now, the maximum length of themes chain is 3 (A→B→C).

Example:

$ sqlite3 -table -nullvalue NULL rb.db "select * from themes where 52 in (id, parent_id) limit 2"
+----+---------+-----------+
| id |  name   | parent_id |
+----+---------+-----------+
| 52 | City    | NULL      |
| 53 | Airport | 52        |
+----+---------+-----------+

part_categories

Columns: id (integer, primary key), name (text).

id is a number, unique for each category. Referenced by parts.part_cat_id.

name is the category name on Rebrickable.

parts

Columns: part_num (text, primary key), name (text), part_cat_id (integer), part_material (text).

part_num is alpha-numeric part number uniquely identifying each part on Rebrickable. Referenced by part_relationships.child_part_num, part_relationships.parent_part_num, elements.part_num, inventory_parts.part_num, part_rels_resolved.child_part_num, part_rels_resolved.parent_part_num.

Although uncommon, part numbers may also contain a dot (75c23.75) and a hyphen (134916-740).

name is the part name on Rebrickable.

part_cat_id is a reference (foreign key) to part_categories.id column.

part_material is the material from which this part is made. Possible values:

$ sqlite3 rb.db "select distinct(part_material) from parts"
Cardboard/Paper
Cloth
Flexible Plastic
Foam
Metal
Plastic
Rubber

part_relationships

Columns: rel_type (text), child_part_num (text), parent_part_num (text).

Each row defines single relationship between two parts child_part_num and parent_part_num, which both are references (foreign keys) to parts.part_num column.

rel_type is a relationship type, defined by a single character, one of: ABMPRT. They all are described in the following sections.

Neither rel_type+child_part_num nor rel_type+parent_part_num are unique across the table.

A - Alternate

Example: A,11954,62531

For 11954 Rebrickable will say it is usable as alternate for the 62531. And vice versa.

Rebrickable uses this relationship in the build matching option “Consider alternate parts that can usually be used as replacements, but are not always functionally compatible.”

There will be no corresponding row A,62531,11954 so this relationship should be considered bidirectional.

B - Sub-Part

Example: B,6051,6051c04

6051 is a sub-part of 6051c04.

M - Mold

Example: M,92950,3455

92950 and 3455 are essentially the same parts where 92950 is a newer mold. For 3455 Rebrickable says it is superseded by 92950.

Rebrickable uses this relationship in the build matching option “Ignore mold variations in parts.”

The successor part is not necessarily listed as child_part_num. And an older part is not necessarily listed as parent_part_num. Here are two examples in the form child_part_num (year_from, year_to) -> parent_part_num (year_from, year_to):

60608 (2007, <present>) -> 3854 (1978, 2008)
3002a (1954, 1990) -> 3002 (1979, <present>)

In case of multiple molds not all combinations are listed. For example, for parts 67695, 93571, 32174 there are two rows:

M,93571,32174
M,67695,32174

But there are no row M,93571,67695 (for the info, 67695 is the latest mold).

Also, alternates not necessarily point to the latest molds, and they may have molds too (as mentioned above, 32174 is an older mold of 67695):

A,60176,32174
M,89652,60176

P - Print

Example: P,4740pr0014,4740

4740pr0014 is a print of 4740.

Rebrickable uses this relationship along with relationship T in the build matching option “Ignore printed and patterned part differences.”

R - Pair

Example: R,18947,35188

18947 pairs with 35188. And vice versa.

There will be no corresponding row R,35188,18947 so this relationship should be considered bidirectional.

T - Pattern

Example: T,19858pat0002,19858

19858pat0002 is a pattern of 19858.

Rebrickable uses this relationship along with relationship P in the build matching option “Ignore printed and patterned part differences.”

elements

Columns: element_id (integer, primary key), part_num (text), color_id (integer), design_id (integer, nullable).

element_id is the most unique characteristic of a part.

The same sets of part_num+color_id+design_id may have multiple element_id:

$ sqlite3 -table rb.db "select * from elements where part_num = '75c06'"
+------------+----------+----------+-----------+
| element_id | part_num | color_id | design_id |
+------------+----------+----------+-----------+
| 4118741    | 75c06    | 0        | 76279     |
| 4270745    | 75c06    | 0        | 76279     |
| 4495367    | 75c06    | 0        | 76279     |
| 4505063    | 75c06    | 0        |           |
| 4546459    | 75c06    | 0        | 76279     |
| 4640742    | 75c06    | 0        | 76279     |
| 6439553    | 75c06    | 10       |           |
| 6451143    | 75c06    | 10       | 100754    |
| 4226277    | 75c06    | 134      |           |
| 4268282    | 75c06    | 134      |           |
| 4285897    | 75c06    | 134      |           |
+------------+----------+----------+-----------+

For most of the part image URLs Rebrickable uses element_id (URL ends then with /parts/elements/<element_id>.jpg). However, not every element has an image. Also some parts do not have element images at all and instead use LDraw images or photos. So element_id is not reliable way to get the part image URL for a given part_num+color_id. See inventory_parts.img_url for a better solution.

This table is not referenced by other tables in the schema.

minifigs

This table lists minifigs. Unlike it may seem, minifig is not necessarily a derivative of torso+legs. Some minifigs are made of regular parts, for example, fig-014490.

Columns: fig_num (text, primary key), name (text), num_parts (integer), img_url (text).

fig_num is an id unique for each minifig. Referenced by inventory_minifigs.fig_num, and by inventories.set_num trough set_nums table.

All fig_num values follow this format: fig-NNNNNN, i.e. 6 decimal digits prefixed with “fig-“. This is an internal id, assigned and used exclusively by Rebrickable.

name is the minifig name on Rebrickable.

num_parts is the number of parts in the minifig inventory. For the info, some minifigs have 100+ parts, for example, 141 in fig-014675.

img_url is the minifig image URL. As for now, every img_url follows this format: https://cdn.rebrickable.com/media/sets/<fig_num>.jpg. So, for example, when embedding a subset of the database, img_url can be omitted to reduce data size.

sets

Columns: set_num (text, primary key), name (text), year (integer), theme_id (integer), num_parts (integer), img_url (text).

set_num is an id unique for each set. Referenced by inventory_sets.set_num, and by inventories.set_num trough set_nums table.

name is the set name on Rebrickable.

year is the year when the set was released. Fairly important column, as the usage years for the parts and colors are made using it. For example, a year, from which the part is used, is calculated as the year of a set where this part was used first. So, basically, if part+color combination is not available in any sets then it “does not exist”. Because of this Rebrickable uses Database Sets to avoid errors for parts which were not (yet) released within regular sets.

theme_id is the set theme as a reference (foreign key) to themes.id column.

num_parts is total number of parts in the set, including parts from minifigs if it has any, but not including spare parts. For example, set 60428-1 has 114 standard parts, two minifigs with 6 and 20 parts, and 8 spare parts. num_parts is 140=114+6+20.

If the set includes other sets, for example K4515-1, then parts from them are not counted in num_parts of the main set.

img_url is the image URL of the set. As for now, every img_url follows this format: https://cdn.rebrickable.com/media/sets/<set_num_LOWERCASE>.jpg (note, set_num must be in lowercase otherwise URL results in HTTP 404). So, for example, when embedding a subset of the database, img_url can be omitted to reduce data size.

inventories

Columns: id (integer, primary key), version (integer), set_num (text).

id is a number, unique for each inventory. Referenced by inventory_minifigs.inventory_id, inventory_parts.inventory_id, inventory_sets.inventory_id.

Being referenced by these three tables means inventory may include standard parts, minifigs, and even other sets.

version is the inventory version on Rebrickable, starting from 1.

set_num references either minifigs.fig_num or sets.set_num. So this table contains inventories for both sets and minifigs.

As for now, minifigs do not have multiple inventories, i.e. for minifig inventories version is always equal to 1.

On practice minifig inventories include only standard parts, i.e. they link only to inventory_parts table. As for the sets, they may include all three types of content, for example, COMCON002-1. Nevertheless Rebrickable counts only standard parts and parts from minifigs in combined inventory of the main set. So does rb.db in part_stats.num_parts and elsewhere.

inventory_minifigs

Columns: inventory_id (integer), fig_num (text), quantity (integer).

inventory_id is a reference (foreign key) to inventories.id column.

It represents inventory, which includes this minifig, not the inventory of minifig itself. To get inventory of minifig use SELECT id from inventories WHERE set_num = '<fig_num_you_need>'.

fig_num is a reference (foreign key) to minifigs.fig_num.

quantity is a number of these minifigs in the inventory.

inventory_parts

Columns: inventory_id (integer), part_num (text), color_id (integer), quantity (integer), is_spare (integer), img_url (text, nullable).

inventory_id is a reference (foreign key) to inventories.id column.

part_num is a reference (foreign key) to parts.part_num column.

color_id is a reference (foreign key) to colors.id column.

quantity is a number of combinations part_num+color_id+is_spare in this inventory. Note that for spare parts there will be separate rows in inventory.

is_spare is a 0/1 flag indicating if this is a spare part.

img_url is the part image URL. When not NULL it always starts with 'https://cdn.rebrickable.com/media/parts/'.

As for now, this img_url is the most reliable way to get an image URL for a given part_num+color_id, so img_url in part_color_stats and part_stats is based on it.

However note that if part does not have image, Rebrickable uses part images in other colors or, if there are none, it may use images of similar parts (e.g. molds or plain parts for prints). There are no way to know in Rebrickable tables if image is canonical, or it is from other part color, or from a similar part.

On Rebrickable similar part images in inventories are marked with “Similar Image” overlay and a note in image title saying “Exact image not available, using similar image from part <similar_part_num>. Part images in other colors are not marked.

For almost all parts their image URLs are the same across all inventories.

inventory_sets

Columns: inventory_id (integer), set_num (text), quantity (integer).

inventory_id is a reference (foreign key) to inventories.id column.

It represents inventory, which includes this set, not the inventory of the set itself. To get inventory of the set use SELECT id from inventories WHERE set_num = '<set_num_you_need>'.

set_num is a reference (foreign key) to sets.set_num.

quantity is a number of these sets in the inventory.

set_nums

Columns: set_num (text, primary key).

This is “technical” table whose sole purpose is to satisfy foreign key constraint for inventories.set_num column.

inventories.set_num column may contain either sets.set_num or minifigs.fig_num but foreign key cannot reference two columns. So both these columns are combined in set_nums table using triggers and inventories.set_num references only set_nums.set_num.

This table is included in the database even when building Rebrickable tables alone, without custom tables, using build.sh -rbonly from the source repository.

It would be hard to decide if this table should be part of Rebrickable tables, or it should be implemented in custom tables via ADD CONSTRAINT variant of the ALTER TABLE. Fortunately SQLite leaves no choice here by not supporting this variant of ALTER TABLE.

Custom Tables

These tables are non-trivially generated, i.e. their data cannot be obtained using, for example, some simple query statement.

color_properties

This is complementary 1-to-1 table to colors table and is separated only because Rebrickable tables are never modified in rb.db.

Columns: id (integer, primary key), sort_pos (integer), is_grayscale (integer, nullable).

id is a reference (foreign key) to colors.id column.

sort_pos is a color position in a sorted list of colors. It is designed to help sorting parts by color.

is_grayscale is a 0/1 flag indicating if color is considered as grayscale. In the following list it is set to 1 for points #3, #4, #5, to 0 for point #6, and to NULL for points #1, #2.

With the sort_pos colors are ordered the following way:

  1. [Unknown]
  2. [No Color/Any Color]
  3. White
  4. Black
  5. Grayscale colors from darker to lighter
  6. Remaining colors, ordered by hue

It is based on the colors order used in “Your Colors” section on the part pages on Rebrickable.

Example:

$ sqlite3 -csv rb.db "select id, name from colors natural join color_properties order by sort_pos limit 10"
-1,[Unknown]
9999,"[No Color/Any Color]"
15,White
0,Black
1103,"Pearl Titanium"
1018,"Modulex Black"
148,"Pearl Dark Gray"
1016,"Modulex Charcoal Gray"
1040,"Modulex Foil Dark Gray"
1126,"HO Metallic Dark Gray"

similar_colors

Columns: ref_id (integer, primary key), ref_name (text), id (integer), name (text), rgb (text), is_trans (integer).

This table lists similar colors for every color. It is inspired by Rebrickable build matching option “Part color sensitivity” → “Parts that have similar colors will be matched.” though results may be different.

ref_id and id are references (foreign keys) to colors.id column. Technically similar_colors is a view to similar_color_ids, which contains just these two columns and joined colors table on both of them.

Column ref_id is indexed, so it is better to search by it instead of id. For every pair of similar colors X→Y table also contains pair Y→X so it is really enough to search only by ref_id or ref_name.

Additional rules apply:

Whether two colors are similar is determined using Delta E metric. Here is great reading about it. Specifically is used “dE00” algorithm and the maximum Delta E value 20. For those curious, Delta E chart for Rebrickable colors.

Example:

$ sqlite3 rb.db "select name from similar_colors where ref_name = 'Red'"
Red
Trans-Red
Light Brown
Rust
Dark Red
Dark Orange
Vintage Red
Modulex Red
Modulex Pink Red
Modulex Foil Red
Dark Nougat
Bright Reddish Orange
Pearl Red
Rust Orange
Two-tone Copper
Two-tone Gold
Metallic Copper
Trans-Neon Red
HO Light Brown
HO Medium Red
HO Rose
Reddish Orange
Sienna Brown
[No Color/Any Color]

part_rels_resolved

Columns: rel_type (text), child_part_num (text), parent_part_num (text).

This is a processed part_relationships table with the same set of columns (see columns description there).

As a result of processing it lists so-called “resolved” relationships, which are calculated this way:

This way to resolve any A/M relationship it is enough to perform single lookup in this table. I.e. for any relationship X and part Y there will be either zero or one row X,Y,Z and no rows starting with X,Z, where X is either A or M.

part_rels_extra

Columns: rel_type (text), child_part_num (text), parent_part_num (text).

This table defines extra relationships, not available on Rebrickable and maintained within rb.db.

Rebrickable does not use fictive parts as “common denominators” for other parts. For example, 35074pr0003 and 35074pr0009 are clearly prints of the same part but unprinted part 35074 does not exist and thus is not listed in Rebrickable tables.

In fact such parts actually exist on Rebrickable to some extent. For example, although 35074 results in “404 Page Not Found”, this part is listed as print of 35074pr0003 with “INACTIVE” word, appended to its title, and with a note that “This part is disabled and cannot be used.”.

There are exceptions though. For example, part 973c00 does not seem to really exist nevertheless its details page is available on the site.

So basically part_rels_extra table contains relationships, made using “common denominator” parts described above, and several extra alternates.

Content of this table is generated using the rules defined in part_rels_extra_rules.txt. See description in this file for details.

Relationships involving “common denominator” parts there can be summarized the following way:

When building this table, relationship is not added if it already exists in part_rels_resolved (for rel_type values A, M) or in part_relationships (for the rest of rel_type values).

So part_rels_extra table complements both these tables. In other words, this union does not have duplicate rows:

SELECT *
  FROM part_relationships
 WHERE rel_type NOT IN ('A', 'M')
 UNION ALL
SELECT *
  FROM part_rels_resolved
 WHERE rel_type IN ('A', 'M')
 UNION ALL
SELECT *
  FROM part_rels_extra

part_color_stats

Columns: part_num (text), color_id (integer), num_sets (integer), min_year (integer), max_year (integer), num_parts (integer), img_url (text, nullable).

This is a view based on the set inventories. It includes only parts which appear in the sets or in the set minifigs.

part_num is a reference (foreign key) to parts.part_num column.

color_id is a reference (foreign key) to colors.id column.

num_sets is a number of sets which include this part/color either as standard part or as minifig part.

min_year/max_year are minimum/maximum years of these sets.

num_parts is total number of these part/color in the set inventory and all its minifigs, but not including spare parts. This is how Rebrickable counts “Num Set Parts” stat on the part detail pages.

img_url is an image URL for the part/color. It is based on inventory_parts table (read notes about img_url there). part_color_stats has only one row per part/color, so when choosing which image to use it follows this priority: elementldrawphotoNULL (but there are actually almost no parts with multiple image URLs).

part_stats

Columns: part_num (text), num_sets (integer), min_year (integer), max_year (integer), num_parts (integer), img_url (text, nullable).

This is basically the same view as part_color_stats except that the stats for all part colors are combined together. It is not a derivative of part_color_stats as you cannot calculate, for example, part_stats.num_sets using part_color_stats.num_sets.

A note about img_url here. Which image to choose when describing a part in general, not a part in specific color? Rebrickable chooses the part which has the largest number of the set parts, even if it is referenced not in the most sets. So does this view.

color_stats

Columns: color_id (integer), num_sets (integer), min_year (integer), max_year (integer), num_parts (integer).

This is basically the same view as part_color_stats except that the stats for all part numbers are combined together. So for detailed description read part_color_stats section.

rb_db_lov

Columns: key (text), value (text).

This table contains list of values, which are described in the following sections.

schema_version

Version of the database schema. Just a number without dots or other characters.

It is incremented with each schema modification, regardless of whether this modification is back compatible or not, or whether it is caused by a change in original Rebrickable tables or by some internal change in rb.db.

In this case release from the latest tag may not always be preferable, as it may include breaking schema changes without prior notice.

To deliver updates with guarantee against unexpected schema changes rb.db uses tags with the schema version. They use format latest-v<N> where <N> is the schema version, for example latest-v5.

The idea is that you start using release with the most recent latest-v<N> tag and get updates until schema changes. After it changes nothing breaks on your side, so you just calmly check what changed and switch to the new schema version.

data_timestamp

UNIX timestamp (in seconds) when the database was generated.

New rb.db is released only when there is new data since the last release, so it is safe to assume that the databases with different data_timestamp values have different data, and the one with greater data_timestamp contains more relevant data.

Examples

Description Source Output
Table like in “Available Colors” section on Rebrickable part detail pages. Example shows colors for part 12939. [sql] [html]
Use “color popularity” coefficient as total number of parts in this color per year normalized by color. Print it for all colors in 2018. Print it for White and Red for all years. [sql] [txt]
Colors as JSON array of pairs <id>,{"name":"<name>","sortPos":<sort_pos>} suitable for JS Map constructor. [sql] [json]
Parts, which have multiple non-NULL image URLs, and which have both NULL and non-NULL image URLs. [sql] [txt]
Printed parts, which do not have image, while their unprinted counterparts have them. [sql] [txt]
Colors ordered by the total number of parts in this color across all sets, with numbers and histogram. [sql] [txt]
Sets per year in numbers and histogram. [sql] [txt]
Canonical URLs for the part detail pages. [sql] [txt]
Part lifetime in numbers and histogram. [sql] [txt]
part_rels_resolved.csv in the same format as part_relationships.csv from Rebrickable. [sql] [csv]
For prints, having ‘pr’ in part_num, all suffix formats in a form where every letter and digit are replaced with x and N. For example, both 3009pr0027e and 4555c02pr0001a will have suffix NNNNx. [sql] [txt]
Different parts (i.e. parts with different part_num regardless of color) having the same image. [sql] [txt]
Multiple parts having the same image. So these will be “similar parts” along with their origin part. [sql] [txt]
Same parts in different colors having the same image. [sql] [txt]
Colors, similar to the given one (Pastel Blue in this example), ordered naturally, so the given one usually will be somewhere in the middle. [sql] [html]
Various statistics. [sql] [txt]
Themes along with all their ancestors composed in a chain. [sql] [txt]