Download: rb.db.xz (18.64 MiB, 2024-09-17 07:55) [rb.db.sha256] (previous versions)
The main goal of rb.db
is to provide original, unmodified tables from Rebrickable Downloads in a form of ready-to-use SQLite database file, and build it on schedule, so the latest release provides always up-to-date version of the database.
Releases are created automatically once a day, but only if there were actual changes since the last release.
Retention policy:
latest
is always recreated when releasing new version, so the latest version link is permanentlatest-v<N>
, where <N>
is the latest schema version, is also always recreated, and similar tags for older schemas are retained. The rationale is described in schema_version
sectionFor Rebrickable tables the main rule is to import them as-is, without adding/removing/modifying any table/column names or data (except for the purpose of data types conversion). Schema only enforces several constraints to ensure the database integrity and the relevance of this documentation:
NOT NULL
to more specific whenever possible)set_nums
table to satisfy foreign key constraint for inventories.set_num
CSV format, in which original Rebrickable tables are provided, cannot include types information for the stored data. Therefore column data types, used by the schema, are determined basing on the column content and SQLite3 specifics:
INTEGER
and TEXT
to avoid possible confusion, as the data types like VARCHAR(N)
do not really imply any constraints in SQLite (docs). Rigid typing allows only a few data types, so this was (fortunately) not much of a choiceINTEGER
values 0
and 1
for boolean columns. Original tables store single t
/f
characters (“true”/“false”) but in context of the schema 0
/1
are more appropriate as they allow to use natural conditions like WHERE is_trans
/WHERE NOT is_trans
INTEGER
for columns containing id, year, quantity. The rest of columns are clearly text so it was not a hard guessNULL
values whereas all missing values in the Rebrickable tables semantically mean NULL
and thus are imported this way in rb.db
Schema of the Rebrickable tables is described in Rebrickable Tables section. rb.db
also includes few custom tables, non-trivially generated from them, and some handy views. They are described in Custom Tables section.
Almost all columns in the tables cannot be NULL
. Thus this is not mentioned in the columns description, and only for nullable columns there will be explicit note about this.
Original Rebrickable tables are provided in CSV format. SQLite can import tables from CSV files directly. However it unconditionally treats empty values as empty strings (details) whereas in context of Rebrickable tables these values have to become NULL
in database.
For example, themes.parent_id
foreign key constraint would fail at all with an empty string, because it expects either an existing themes.id
value or NULL
.
This is why the import scripts import tables directly instead of relying on .import
SQLite3 command.
Current schema version is 7. List of changes for each schema version:
rb_db_lov
color_properties.color_id
to color_properties.id
as this is complementary tablepart_rels_extra
integer (0/1)
for color.is_trans
and inventory_parts.is_spare
color_properties.is_grayscale
part_color_stats
, part_stats
, part_color_images
, part_images
color_stats
and merged views part_[color_]images
with part_[color_]stats
This table contains the part colors.
Columns: id
(integer, primary key), name
(text), rgb
(text), is_trans
(integer).
id
is a number, unique for each color. Referenced by inventory_parts.color_id
, elements.color_id
, color_properties.id
, similar_color_ids.ref_id
, similar_color_ids.id
.
name
is the color name on Rebrickable.
rgb
is RGB color in a form of HEX triplet, 6 hexadecimal digits without prefix.
is_trans
is a 0
/1
flag indicating if color is transparent.
Example:
$ sqlite3 rb.db "select * from colors group by is_trans"
-1|[Unknown]|0033B2|0
32|Trans-Black IR Lens|635F52|1
This table contains the set themes.
Columns: id
(integer, primary key), name
(text), parent_id
(integer, nullable).
id
is a number, unique for each theme. Referenced by sets.theme_id
and even by this table in parent_id
column.
name
is the theme name on Rebrickable.
parent_id
is the parent theme id for sub-themes and NULL
otherwise.
As for now, the maximum length of themes chain is 3 (A→B→C).
Example:
$ sqlite3 -table -nullvalue NULL rb.db "select * from themes where 52 in (id, parent_id) limit 2"
+----+---------+-----------+
| id | name | parent_id |
+----+---------+-----------+
| 52 | City | NULL |
| 53 | Airport | 52 |
+----+---------+-----------+
Columns: id
(integer, primary key), name
(text).
id
is a number, unique for each category. Referenced by parts.part_cat_id
.
name
is the category name on Rebrickable.
Columns: part_num
(text, primary key), name
(text), part_cat_id
(integer), part_material
(text).
part_num
is alpha-numeric part number uniquely identifying each part on Rebrickable. Referenced by part_relationships.child_part_num
, part_relationships.parent_part_num
, elements.part_num
, inventory_parts.part_num
, part_rels_resolved.child_part_num
, part_rels_resolved.parent_part_num
.
Although uncommon, part numbers may also contain a dot (75c23.75
) and a hyphen (134916-740
).
name
is the part name on Rebrickable.
part_cat_id
is a reference (foreign key) to part_categories.id
column.
part_material
is the material from which this part is made. Possible values:
$ sqlite3 rb.db "select distinct(part_material) from parts"
Cardboard/Paper
Cloth
Flexible Plastic
Foam
Metal
Plastic
Rubber
Columns: rel_type
(text), child_part_num
(text), parent_part_num
(text).
Each row defines single relationship between two parts child_part_num
and parent_part_num
, which both are references (foreign keys) to parts.part_num
column.
rel_type
is a relationship type, defined by a single character, one of: ABMPRT
. They all are described in the following sections.
Neither rel_type+child_part_num
nor rel_type+parent_part_num
are unique across the table.
A
- AlternateExample: A,11954,62531
For 11954
Rebrickable will say it is usable as alternate for the 62531
. And vice versa.
Rebrickable uses this relationship in the build matching option “Consider alternate parts that can usually be used as replacements, but are not always functionally compatible.”
There will be no corresponding row A,62531,11954
so this relationship should be considered bidirectional.
B
- Sub-PartExample: B,6051,6051c04
6051
is a sub-part of 6051c04
.
M
- MoldExample: M,92950,3455
92950
and 3455
are essentially the same parts where 92950 is a newer mold. For 3455 Rebrickable says it is superseded by 92950.
Rebrickable uses this relationship in the build matching option “Ignore mold variations in parts.”
The successor part is not necessarily listed as child_part_num
. And an older part is not necessarily listed as parent_part_num
. Here are two examples in the form child_part_num (year_from, year_to) -> parent_part_num (year_from, year_to)
:
60608 (2007, <present>) -> 3854 (1978, 2008)
3002a (1954, 1990) -> 3002 (1979, <present>)
In case of multiple molds not all combinations are listed. For example, for parts 67695
, 93571
, 32174
there are two rows:
M,93571,32174
M,67695,32174
But there are no row M,93571,67695
(for the info, 67695
is the latest mold).
Also, alternates not necessarily point to the latest molds, and they may have molds too (as mentioned above, 32174 is an older mold of 67695):
A,60176,32174
M,89652,60176
P
- PrintExample: P,4740pr0014,4740
4740pr0014
is a print of 4740
.
Rebrickable uses this relationship along with relationship T
in the build matching option “Ignore printed and patterned part differences.”
R
- PairExample: R,18947,35188
18947
pairs with 35188
. And vice versa.
There will be no corresponding row R,35188,18947
so this relationship should be considered bidirectional.
T
- PatternExample: T,19858pat0002,19858
19858pat0002
is a pattern of 19858
.
Rebrickable uses this relationship along with relationship P
in the build matching option “Ignore printed and patterned part differences.”
Columns: element_id
(integer, primary key), part_num
(text), color_id
(integer), design_id
(integer, nullable).
element_id
is the most unique characteristic of a part.
The same sets of part_num
+color_id
+design_id
may have multiple element_id
:
$ sqlite3 -table rb.db "select * from elements where part_num = '75c06'"
+------------+----------+----------+-----------+
| element_id | part_num | color_id | design_id |
+------------+----------+----------+-----------+
| 4118741 | 75c06 | 0 | 76279 |
| 4270745 | 75c06 | 0 | 76279 |
| 4495367 | 75c06 | 0 | 76279 |
| 4505063 | 75c06 | 0 | |
| 4546459 | 75c06 | 0 | 76279 |
| 4640742 | 75c06 | 0 | 76279 |
| 6439553 | 75c06 | 10 | |
| 6451143 | 75c06 | 10 | 100754 |
| 4226277 | 75c06 | 134 | |
| 4268282 | 75c06 | 134 | |
| 4285897 | 75c06 | 134 | |
+------------+----------+----------+-----------+
For most of the part image URLs Rebrickable uses element_id
(URL ends then with /parts/elements/<element_id>.jpg
). However, not every element has an image. Also some parts do not have element images at all and instead use LDraw images or photos. So element_id
is not reliable way to get the part image URL for a given part_num
+color_id
. See inventory_parts.img_url
for a better solution.
This table is not referenced by other tables in the schema.
This table lists minifigs. Unlike it may seem, minifig is not necessarily a derivative of torso+legs. Some minifigs are made of regular parts, for example, fig-014490.
Columns: fig_num
(text, primary key), name
(text), num_parts
(integer), img_url
(text).
fig_num
is an id unique for each minifig. Referenced by inventory_minifigs.fig_num
, and by inventories.set_num
trough set_nums
table.
All fig_num
values follow this format: fig-NNNNNN
, i.e. 6 decimal digits prefixed with “fig-“. This is an internal id, assigned and used exclusively by Rebrickable.
name
is the minifig name on Rebrickable.
num_parts
is the number of parts in the minifig inventory. For the info, some minifigs have 100+ parts, for example, 141 in fig-014675.
img_url
is the minifig image URL. As for now, every img_url
follows this format: https://cdn.rebrickable.com/media/sets/<fig_num>.jpg
. So, for example, when embedding a subset of the database, img_url
can be omitted to reduce data size.
Columns: set_num
(text, primary key), name
(text), year
(integer), theme_id
(integer), num_parts
(integer), img_url
(text).
set_num
is an id unique for each set. Referenced by inventory_sets.set_num
, and by inventories.set_num
trough set_nums
table.
name
is the set name on Rebrickable.
year
is the year when the set was released. Fairly important column, as the usage years for the parts and colors are made using it. For example, a year, from which the part is used, is calculated as the year of a set where this part was used first. So, basically, if part+color combination is not available in any sets then it “does not exist”. Because of this Rebrickable uses Database Sets to avoid errors for parts which were not (yet) released within regular sets.
theme_id
is the set theme as a reference (foreign key) to themes.id
column.
num_parts
is total number of parts in the set, including parts from minifigs if it has any, but not including spare parts. For example, set 60428-1 has 114 standard parts, two minifigs with 6 and 20 parts, and 8 spare parts. num_parts
is 140=114+6+20
.
If the set includes other sets, for example K4515-1, then parts from them are not counted in num_parts
of the main set.
img_url
is the image URL of the set. As for now, every img_url
follows this format: https://cdn.rebrickable.com/media/sets/<set_num_LOWERCASE>.jpg
(note, set_num
must be in lowercase otherwise URL results in HTTP 404). So, for example, when embedding a subset of the database, img_url
can be omitted to reduce data size.
Columns: id
(integer, primary key), version
(integer), set_num
(text).
id
is a number, unique for each inventory. Referenced by inventory_minifigs.inventory_id
, inventory_parts.inventory_id
, inventory_sets.inventory_id
.
Being referenced by these three tables means inventory may include standard parts, minifigs, and even other sets.
version
is the inventory version on Rebrickable, starting from 1
.
set_num
references either minifigs.fig_num
or sets.set_num
. So this table contains inventories for both sets and minifigs.
As for now, minifigs
do not have multiple inventories, i.e. for minifig inventories version
is always equal to 1
.
On practice minifig inventories include only standard parts, i.e. they link only to inventory_parts
table. As for the sets, they may include all three types of content, for example, COMCON002-1
. Nevertheless Rebrickable counts only standard parts and parts from minifigs in combined inventory of the main set. So does rb.db
in part_stats.num_parts
and elsewhere.
Columns: inventory_id
(integer), fig_num
(text), quantity
(integer).
inventory_id
is a reference (foreign key) to inventories.id
column.
It represents inventory, which includes this minifig, not the inventory of minifig itself. To get inventory of minifig use SELECT id from inventories WHERE set_num = '<fig_num_you_need>'
.
fig_num
is a reference (foreign key) to minifigs.fig_num
.
quantity
is a number of these minifigs in the inventory.
Columns: inventory_id
(integer), part_num
(text), color_id
(integer), quantity
(integer), is_spare
(integer), img_url
(text, nullable).
inventory_id
is a reference (foreign key) to inventories.id
column.
part_num
is a reference (foreign key) to parts.part_num
column.
color_id
is a reference (foreign key) to colors.id
column.
quantity
is a number of combinations part_num
+color_id
+is_spare
in this inventory. Note that for spare parts there will be separate rows in inventory.
is_spare
is a 0
/1
flag indicating if this is a spare part.
img_url
is the part image URL. When not NULL
it always starts with 'https://cdn.rebrickable.com/media/parts/'
.
As for now, this img_url
is the most reliable way to get an image URL for a given part_num
+color_id
, so img_url
in part_color_stats
and part_stats
is based on it.
However note that if part does not have image, Rebrickable uses part images in other colors or, if there are none, it may use images of similar parts (e.g. molds or plain parts for prints). There are no way to know in Rebrickable tables if image is canonical, or it is from other part color, or from a similar part.
On Rebrickable similar part images in inventories are marked with “Similar Image” overlay and a note in image title saying “Exact image not available, using similar image from part <similar_part_num>
“. Part images in other colors are not marked.
For almost all parts their image URLs are the same across all inventories.
Columns: inventory_id
(integer), set_num
(text), quantity
(integer).
inventory_id
is a reference (foreign key) to inventories.id
column.
It represents inventory, which includes this set, not the inventory of the set itself. To get inventory of the set use SELECT id from inventories WHERE set_num = '<set_num_you_need>'
.
set_num
is a reference (foreign key) to sets.set_num
.
quantity
is a number of these sets in the inventory.
Columns: set_num
(text, primary key).
This is “technical” table whose sole purpose is to satisfy foreign key constraint for inventories.set_num
column.
inventories.set_num
column may contain either sets.set_num
or minifigs.fig_num
but foreign key cannot reference two columns. So both these columns are combined in set_nums
table using triggers and inventories.set_num
references only set_nums.set_num
.
This table is included in the database even when building Rebrickable tables alone, without custom tables, using build.sh -rbonly
from the source repository.
It would be hard to decide if this table should be part of Rebrickable tables, or it should be implemented in custom tables via ADD CONSTRAINT
variant of the ALTER TABLE
. Fortunately SQLite leaves no choice here by not supporting this variant of ALTER TABLE
.
These tables are non-trivially generated, i.e. their data cannot be obtained using, for example, some simple query statement.
This is complementary 1-to-1 table to colors
table and is separated only because Rebrickable tables are never modified in rb.db
.
Columns: id
(integer, primary key), sort_pos
(integer), is_grayscale
(integer, nullable).
id
is a reference (foreign key) to colors.id
column.
sort_pos
is a color position in a sorted list of colors. It is designed to help sorting parts by color.
is_grayscale
is a 0
/1
flag indicating if color is considered as grayscale. In the following list it is set to 1
for points #3, #4, #5, to 0
for point #6, and to NULL
for points #1, #2.
With the sort_pos
colors are ordered the following way:
[Unknown]
[No Color/Any Color]
White
Black
It is based on the colors order used in “Your Colors” section on the part pages on Rebrickable.
Example:
$ sqlite3 -csv rb.db "select id, name from colors natural join color_properties order by sort_pos limit 10"
-1,[Unknown]
9999,"[No Color/Any Color]"
15,White
0,Black
1103,"Pearl Titanium"
1018,"Modulex Black"
148,"Pearl Dark Gray"
1016,"Modulex Charcoal Gray"
1040,"Modulex Foil Dark Gray"
1126,"HO Metallic Dark Gray"
Columns: ref_id
(integer, primary key), ref_name
(text), id
(integer), name
(text), rgb
(text), is_trans
(integer).
This table lists similar colors for every color. It is inspired by Rebrickable build matching option “Part color sensitivity” → “Parts that have similar colors will be matched.” though results may be different.
ref_id
and id
are references (foreign keys) to colors.id
column. Technically similar_colors
is a view to similar_color_ids
, which contains just these two columns and joined colors
table on both of them.
Column ref_id
is indexed, so it is better to search by it instead of id
. For every pair of similar colors X→Y
table also contains pair Y→X
so it is really enough to search only by ref_id
or ref_name
.
Additional rules apply:
[Unknown]
color is never similar to any color[No Color/Any Color]
color is similar to all colorsref_id = id
Whether two colors are similar is determined using Delta E metric. Here is great reading about it. Specifically is used “dE00” algorithm and the maximum Delta E value 20
. For those curious, Delta E chart for Rebrickable colors.
Example:
$ sqlite3 rb.db "select name from similar_colors where ref_name = 'Red'"
Red
Trans-Red
Light Brown
Rust
Dark Red
Dark Orange
Vintage Red
Modulex Red
Modulex Pink Red
Modulex Foil Red
Dark Nougat
Bright Reddish Orange
Pearl Red
Rust Orange
Two-tone Copper
Two-tone Gold
Metallic Copper
Trans-Neon Red
HO Light Brown
HO Medium Red
HO Rose
Reddish Orange
Sienna Brown
[No Color/Any Color]
Columns: rel_type
(text), child_part_num
(text), parent_part_num
(text).
This is a processed part_relationships
table with the same set of columns (see columns description there).
As a result of processing it lists so-called “resolved” relationships, which are calculated this way:
A
and M
, as only these are subject of resolving (i.e. other rows would be the same as in part_relationships
)M
- Mold section):
parent_part_num
child_part_num
and parent_part_num
parent_part_num
use the part, which is referenced in a newer set, or, if the newest sets have the same year, the part that is referenced in more sets.This way to resolve any A
/M
relationship it is enough to perform single lookup in this table. I.e. for any relationship X
and part Y
there will be either zero or one row X,Y,Z
and no rows starting with X,Z,
where X
is either A
or M
.
Columns: rel_type
(text), child_part_num
(text), parent_part_num
(text).
This table defines extra relationships, not available on Rebrickable and maintained within rb.db
.
Rebrickable does not use fictive parts as “common denominators” for other parts. For example, 35074pr0003
and 35074pr0009
are clearly prints of the same part but unprinted part 35074
does not exist and thus is not listed in Rebrickable tables.
In fact such parts actually exist on Rebrickable to some extent. For example, although 35074
results in “404 Page Not Found”, this part is listed as print of 35074pr0003
with “INACTIVE” word, appended to its title, and with a note that “This part is disabled and cannot be used.”.
There are exceptions though. For example, part 973c00
does not seem to really exist nevertheless its details page is available on the site.
So basically part_rels_extra
table contains relationships, made using “common denominator” parts described above, and several extra alternates.
Content of this table is generated using the rules defined in part_rels_extra_rules.txt
. See description in this file for details.
Relationships involving “common denominator” parts there can be summarized the following way:
35074pr0003
results in a row P,35074pr0003,35074
even if part 35074
does not exist100662pat0001pr0002
there will be rows P,100662pat0001pr0002,100662pat0001
and T,100662pat0001,100662
but not P,100662pat0001pr0002,100662pr0002
. Also note that in the pattern row both parts do not actually exist973c00
and 970c00
when reasonable.When building this table, relationship is not added if it already exists in part_rels_resolved
(for rel_type
values A
, M
) or in part_relationships
(for the rest of rel_type
values).
So part_rels_extra
table complements both these tables. In other words, this union does not have duplicate rows:
SELECT *
FROM part_relationships
WHERE rel_type NOT IN ('A', 'M')
UNION ALL
SELECT *
FROM part_rels_resolved
WHERE rel_type IN ('A', 'M')
UNION ALL
SELECT *
FROM part_rels_extra
Columns: part_num
(text), color_id
(integer), num_sets
(integer), min_year
(integer), max_year
(integer), num_parts
(integer), img_url
(text, nullable).
This is a view based on the set inventories. It includes only parts which appear in the sets or in the set minifigs.
part_num
is a reference (foreign key) to parts.part_num
column.
color_id
is a reference (foreign key) to colors.id
column.
num_sets
is a number of sets which include this part/color either as standard part or as minifig part.
min_year
/max_year
are minimum/maximum years of these sets.
num_parts
is total number of these part/color in the set inventory and all its minifigs, but not including spare parts. This is how Rebrickable counts “Num Set Parts” stat on the part detail pages.
img_url
is an image URL for the part/color. It is based on inventory_parts
table (read notes about img_url
there). part_color_stats
has only one row per part/color, so when choosing which image to use it follows this priority: element
→ ldraw
→ photo
→ NULL
(but there are actually almost no parts with multiple image URLs).
Columns: part_num
(text), num_sets
(integer), min_year
(integer), max_year
(integer), num_parts
(integer), img_url
(text, nullable).
This is basically the same view as part_color_stats
except that the stats for all part colors are combined together. It is not a derivative of part_color_stats
as you cannot calculate, for example, part_stats.num_sets
using part_color_stats.num_sets
.
A note about img_url
here. Which image to choose when describing a part in general, not a part in specific color? Rebrickable chooses the part which has the largest number of the set parts, even if it is referenced not in the most sets. So does this view.
Columns: color_id
(integer), num_sets
(integer), min_year
(integer), max_year
(integer), num_parts
(integer).
This is basically the same view as part_color_stats
except that the stats for all part numbers are combined together. So for detailed description read part_color_stats
section.
Columns: key
(text), value
(text).
This table contains list of values, which are described in the following sections.
schema_version
Version of the database schema. Just a number without dots or other characters.
It is incremented with each schema modification, regardless of whether this modification is back compatible or not, or whether it is caused by a change in original Rebrickable tables or by some internal change in rb.db
.
In this case release from the latest
tag may not always be preferable, as it may include breaking schema changes without prior notice.
To deliver updates with guarantee against unexpected schema changes rb.db
uses tags with the schema version. They use format latest-v<N>
where <N>
is the schema version, for example latest-v5
.
The idea is that you start using release with the most recent latest-v<N>
tag and get updates until schema changes. After it changes nothing breaks on your side, so you just calmly check what changed and switch to the new schema version.
data_timestamp
UNIX timestamp (in seconds) when the database was generated.
New rb.db
is released only when there is new data since the last release, so it is safe to assume that the databases with different data_timestamp
values have different data, and the one with greater data_timestamp
contains more relevant data.
Description | Source | Output |
---|---|---|
Table like in “Available Colors” section on Rebrickable part detail pages. Example shows colors for part 12939 . |
[sql ] |
[html ] |
Use “color popularity” coefficient as total number of parts in this color per year normalized by color. Print it for all colors in 2018. Print it for White and Red for all years. | [sql ] |
[txt ] |
Colors as JSON array of pairs <id>,{"name":"<name>","sortPos":<sort_pos>} suitable for JS Map constructor. |
[sql ] |
[json ] |
Parts, which have multiple non-NULL image URLs, and which have both NULL and non-NULL image URLs. |
[sql ] |
[txt ] |
Printed parts, which do not have image, while their unprinted counterparts have them. | [sql ] |
[txt ] |
Colors ordered by the total number of parts in this color across all sets, with numbers and histogram. | [sql ] |
[txt ] |
Sets per year in numbers and histogram. | [sql ] |
[txt ] |
Canonical URLs for the part detail pages. | [sql ] |
[txt ] |
Part lifetime in numbers and histogram. | [sql ] |
[txt ] |
part_rels_resolved.csv in the same format as part_relationships.csv from Rebrickable. |
[sql ] |
[csv ] |
For prints, having ‘pr’ in part_num , all suffix formats in a form where every letter and digit are replaced with x and N . For example, both 3009pr0027e and 4555c02pr0001a will have suffix NNNNx . |
[sql ] |
[txt ] |
Different parts (i.e. parts with different part_num regardless of color) having the same image. |
[sql ] |
[txt ] |
Multiple parts having the same image. So these will be “similar parts” along with their origin part. | [sql ] |
[txt ] |
Same parts in different colors having the same image. | [sql ] |
[txt ] |
Colors, similar to the given one (Pastel Blue in this example), ordered naturally, so the given one usually will be somewhere in the middle. |
[sql ] |
[html ] |
Various statistics. | [sql ] |
[txt ] |
Themes along with all their ancestors composed in a chain. | [sql ] |
[txt ] |