Difference between revisions of "Database Table Overview"

From Dreamwidth Notes
Jump to: navigation, search
m (Extra Information)
(no need to link to the cat Database Tables pulls in all the tables.)
 
(14 intermediate revisions by 4 users not shown)
Line 3: Line 3:
 
==Introduction==
 
==Introduction==
  
All the information that is entered into Dreamwidth (like with most web apps), gets stored in a database.  So understanding the table structure can help you understand how Dreamwidth is storing things.  That's the theory, anyway.  It also helps in debugging, extending Dreamwidth and so on.  So it's good to know, even if you're not planning on altering the tables or the interfacing code directly.
+
All the information that gets entered into Dreamwidth is, (like with most web apps) stored in a database.  It's the database that stores your journal, remembers all the comments and keeps track of all users, while the rest of Dreamwidth (the code and web based part) occupies itself with fetching things from the database to generate web pages for you to view or taking information which you've entered and getting it ready to store into the database.  Using techno-speak, the database proves the persistent data storage for the web/perl front-end.  If you want to understand what the code is doing (by understanding what data is being sent or received from the database when the code is generating web pages or storing data), or if you want to extend any part of Dreamwidth which needs some way of storing data, then you need to know about the database structure.
  
This section provides an overview on the tables, what they do and how they map to what you seeEnough information so that you get up and running, anyway.  That's the plan.   
+
Pretty much all databases store their information in tables, and Dreamwidth's database is no exceptionTo understand Dreamwidth's database structure, you need to understand all the tables that Dreamwidth has and also (quite importantly) how they link together.   
  
==Database Relationships==
+
Dreamwidth has a lot of tables.  The full list of tables can be viewed through [[Database Tables]] and all the relationships between the tables can be viewed through [[Database Relationships]].  Because these lists are complete and exhaustive but otherwise unhelpful, this page attempts to describe all the database tables that Dreamwidth has as well as their relationships by linking them to front end concepts you will already know from using the site.  The idea is to make sure that you can quickly understand what tables correspond to the bit of code (or the DW feature) you are interested in, and whether any table is important to what you are doing.
  
Databases only store things in tables.  However, most data as stored and handled (for example, as Dreamwidth does in Perl) comes in more complicated forms.  These can often be broken down into simple tables with simple relationships mimicking the data structures used in code. 
+
===Understanding This Page===
  
This guide lists some of the relationships between databases.  Because this guide doesn't assume any knowledge of databases, I'm including a short description here as I use these terms further down.
+
This page should be usable by a people who don't know much about databases (or can't quite remember what they did when they last used one).  However, depending on your skill levels, you may need to read up on some things first.  If you are a complete newbie, you should read all of these links first.
  
===One to One===
+
Dreamwidth uses a slightly amped up version of the popular MySQL database.  The code talks to the database using SQL queries.  If you have used MySQL before, you should be ok with Dreamwidth.  MySQL also tends to be very good about implementing SQL, so the odds are that if you've used another database before, most (but not all) of the SQL you already know will be transferable.  There will probably be problems with small things, and these are best resolved by consulting the MySQL manual for any command you are trying to use that is giving you grief. 
  
(Also: HasOne)
+
If you are completely new to databases and SQL (or you want to know how to connect using a dreamhack account), please go see [[Exploring the database]], which has a lovely tutorial for guiding you through connecting to your Dreamwidth implementation and viewing some of the information in the database.  There is nothing like entering something into a web-page and then seeing the changes reflected in the database tables to teach you what's happening and also to help debug your code, if needed. 
  
Table A has a relationship to table B in that each entry in A has one and only one corresponding entry in B.
+
Database tables should also be defined, for complete newbies.  If you're thinking of the kind of tables you drew in school to represent data, then you are not far wrong.  Tables in databases have some additional properties to worry about.  Each column has a type (character string, integer, text, date etc.) which it can be, and columns also need to know their defaults, or whether they are allowed to have entries with no data.  Finally, columns can have extra special features like being a key, or auto incrementing.
  
You can tell a one to one relationship because either table A has a field which references the primary key of table B, or table B's primary key is the same as table A (e.g. [[Database Table: userbio]]).
+
This guide sometimes references the concept of primary and foreign keys.  A primary key in a database table is a column (although sometimes it is more than one column) which uniquely describes that entry in the database.  Primary keys are particularly useful in relational databases (of which MySQL is one) because it allows entries in other tables (or even other entries in the same table) to refer to one unique entry.  Also, the database tends to be happier if it can identify entries uniquely as well.  A column in a table B which references a primary key in table A is called a foreign key.  The presence of primary/foreign key pairs implies a relationship between entries in the two separate tables, which is why they are important.  If you understand programming, you should view the foreign key as a pointer or reference to a particular entry in a database, and the primary key as being equivalent to the location in memory.
  
===One to Many===
+
This overview also talks a lot about the relationships between tables.  This overview uses terms that are common in relational database design, and should be familiar if you're already familiar with relational databases. As these terms are not completely standard, you might need to check out [[Database Relationships Explained]] to make sure you understand the terminology being used.  If you are a newbie, you should definitely check out [[Database Relationships Explained]], as it explains the different types of relational database used in Dreamwidth (and commonly on the internet as well).
 
+
(Also: BelongsToMany)
+
 
+
Table A has a relationship to table B such that many  entries in B can refer to one entry in A.
+
 
+
You can tell a one to many relationship because table B has a field which references the primary key in table A which isn't a primary key in B.
+
 
+
===Many to Many or Link Tables===
+
 
+
(Also: HasAndBelongsToMany)
+
 
+
Table A has a relationship to table B such that many entries to A can map to many entries in B.
+
 
+
Unlike the other relationships, a many to many relationship requires a third table, C, to define the relationship.  C is a link table and has a field containing a reference to a primary key in table A and a reference to a primary key in B. 
+
 
+
Things are further complicated by the fact that link tables can also contain data about the relationship.  For example, user_schools is a link table defining the many to many relationship between schools and users.  It also has two extra fields, year started and year ended.
+
 
+
===Tree===
+
 
+
Trees, threaded tables or Dynamic Acyclic Graphs (DAGs) are when an entry in table A has a relationship to another entry in table A which is considered its parentMany entries can have the same parent and the topmost parent, the root of the tree, has no parent. 
+
 
+
You can tell a table is a tree because they have a primary key field, another field called some variant of parent id which stores the primary key of the parent and may have a further field which stores the order of the siblings (if it is important).
+
 
+
Comments in Dreamwidth are stored as trees.
+
 
+
===Graph===
+
 
+
A graph refers to the mathematical concept of a graph, and not to the kind of graph that is a chart, which is what most people commonly think of when they hear the word graph.
+
 
+
A graph has vertices and edges, and is the perfect concept for storing networks.  An example of a network would be a railway network, where the train stations are the vertices and the train services operating between the stations are the edges. 
+
 
+
The friends system in Dreamwidth is perfectly represented by a graph in which users are the vertices and the subscriptions/trusted access relations between users are the edges.  In this case, the table users holds the vertices while wt_edges holds the edges.
+
 
+
A graph (in databases) is a special case of a many to many mapping where instead of a table C mapping together an entry in table A and an entry in table B, C maps an entry in table A onto another entry in table A.
+
 
+
===Property Lists===
+
 
+
Property lists are a form of many to many link used to solve a particular problem with databases.
+
 
+
Modern databases are required (by the first normal form) to have the same number of fields and the same field names for all entries in a table.  This is good for storing data where you can know the size and structure of the table beforehand.  However, if you can't know the structure of the table beforehand (because of the nature of the data you will be storing), then this requirement is a sincere pain.  This is particularly true of properties.
+
 
+
For example, we have a user called user1.  We want to add the ability to store twitter accounts into Dreamwidth.  So the logical course of action would be to add a twitter field into the user table.  If we have a lot of users, this adds a lot of bloat into our database.  We can lessen that a bit by creating a new one to one mapping table called usertwitter.  However, imagine we have to manipulate the user tables every time we wanted to add a property.  This is very invasive and costly to do for large databases.  And what would happen if we needed to delete a property?  We could solve that problem by having changing fields per table entry, by the first normal form forbids that and all modern databases use the first normal form.  What do we do?
+
 
+
Enter property lists.  These consist of two tables, a property list, where we define the property name, it's default value and a primary key to identify it.  We can also store more data (e.g. Dreamwidth typically stores long descriptions in its property lists).  The second table maps a property list key to a value.
+
 
+
The arrangement above is a one to one mapping (each property list entry has one value).  This is good for global properties (which Dreamwidth doesn't use).  However, the relationship becomes many to many because Dreamwidth does use them defined per user or per journal entry.  So, the user property table maps a user to a property, and each user can have many properties set, and each property type has a different value for different users.
+
 
+
All property lists in Dreamwidth follow a broad convention.  The table *proplist contains the property list while *prop contains the property mapping and value.  E.g. journal entries are stored in logprop and logproplist.
+
  
 
==Tables by Concept==
 
==Tables by Concept==
Line 79: Line 31:
 
Details about a user are held in the table [[Database Table: user]] and this forms the centre of how user information is stored about users.
 
Details about a user are held in the table [[Database Table: user]] and this forms the centre of how user information is stored about users.
  
There are many other tables prefixed by the phrase user, most of which (but not all) deal additionally with user information.  We divide these into three types:
+
There are many other tables which help in storing information about users, many (but not all) of them prefixed by the phrase user.  We divide these other tables into three groups for ease of description:
  
* One to one and one to many tables, which basically hold extra information about each user in the user table.
+
* One to one and one to many tables, which basically hold extra information about each user.
* Link tables linking users in the user table to other concepts (like schools in the schools table, for example).
+
* Link tables which link users to other concepts (like schools in the schools table, for example).
* User Property Lists (see [[#Property Lists]] further down for a description of what a proplist tables are)
+
* User Property Lists (see [[#Property Lists]] above).
  
Also, there is a table dealing with linking users to users called wt_edges.  This is described further under the friends concept.
+
Also, there is a table dealing with linking users to users (in a graph structure) called wt_edges.  This is described further under the friends concept.
  
 
====Extra Information====
 
====Extra Information====
Line 121: Line 73:
 
===Tags===
 
===Tags===
  
Thanks to the history of how Dreamwidth developed, the name for a tag is in fact a keyword inside the databases (see [[#BradLog]] for further details).
+
Thanks to the history of how Dreamwidth developed, inside the databases tags are called keywords (see [[#BradLog]] for further details).
  
 
The table keyword itself is legacy and does not appear to be used anymore.  Tags are instead held in userkeywords and usertags.   
 
The table keyword itself is legacy and does not appear to be used anymore.  Tags are instead held in userkeywords and usertags.   
  
 
The table usertags appears to be a tree but holds no tag names, while userkeywords holds tag names but no structure.
 
The table usertags appears to be a tree but holds no tag names, while userkeywords holds tag names but no structure.
 +
 +
Tags are mapped to individual posts in the [[Database Table: logtags|logtags]] table, and also in the [[Database Table: logtagsrecent|logtagsrecent]].
  
 
==Appendix==
 
==Appendix==
Line 131: Line 85:
 
===BradLog===
 
===BradLog===
  
[[Category: Development]]
+
Livejournal, the predecessor to Dreamwidth, dates from an era where the language used to describe blogging hadn't yet settled down to its current standardised form.  Although most of the code used in LJ has been updated over time to follow the new terminology, the database (especially), is rife with the old terminology (with just enough of the new to throw you off), making it very confusing. 
[[Category: Database Tables]]
+
 
 +
LJ grew out of a program written by founder Brad, called BradLog.  In this, each entry was a '''log''' entry in a web log (blog).  Users could '''talk''' about each log entry (comments) and each log entry could be described by '''keywords''' (tags). 
 +
 
 +
[[Category:Development]]
 +
[[Category:Database Tables]]

Latest revision as of 08:30, 5 February 2010

Note: This page has been started by a complete newbie, who is trying to make sense of the database table structure as she goes and is writing the information down. As a result, the details may be inaccurate and so on. Hopefully, it's still better than nothing to the equally clueless! ;-) Aphenine 22:36, 28 July 2009 (UTC)

Introduction

All the information that gets entered into Dreamwidth is, (like with most web apps) stored in a database. It's the database that stores your journal, remembers all the comments and keeps track of all users, while the rest of Dreamwidth (the code and web based part) occupies itself with fetching things from the database to generate web pages for you to view or taking information which you've entered and getting it ready to store into the database. Using techno-speak, the database proves the persistent data storage for the web/perl front-end. If you want to understand what the code is doing (by understanding what data is being sent or received from the database when the code is generating web pages or storing data), or if you want to extend any part of Dreamwidth which needs some way of storing data, then you need to know about the database structure.

Pretty much all databases store their information in tables, and Dreamwidth's database is no exception. To understand Dreamwidth's database structure, you need to understand all the tables that Dreamwidth has and also (quite importantly) how they link together.

Dreamwidth has a lot of tables. The full list of tables can be viewed through Database Tables and all the relationships between the tables can be viewed through Database Relationships. Because these lists are complete and exhaustive but otherwise unhelpful, this page attempts to describe all the database tables that Dreamwidth has as well as their relationships by linking them to front end concepts you will already know from using the site. The idea is to make sure that you can quickly understand what tables correspond to the bit of code (or the DW feature) you are interested in, and whether any table is important to what you are doing.

Understanding This Page

This page should be usable by a people who don't know much about databases (or can't quite remember what they did when they last used one). However, depending on your skill levels, you may need to read up on some things first. If you are a complete newbie, you should read all of these links first.

Dreamwidth uses a slightly amped up version of the popular MySQL database. The code talks to the database using SQL queries. If you have used MySQL before, you should be ok with Dreamwidth. MySQL also tends to be very good about implementing SQL, so the odds are that if you've used another database before, most (but not all) of the SQL you already know will be transferable. There will probably be problems with small things, and these are best resolved by consulting the MySQL manual for any command you are trying to use that is giving you grief.

If you are completely new to databases and SQL (or you want to know how to connect using a dreamhack account), please go see Exploring the database, which has a lovely tutorial for guiding you through connecting to your Dreamwidth implementation and viewing some of the information in the database. There is nothing like entering something into a web-page and then seeing the changes reflected in the database tables to teach you what's happening and also to help debug your code, if needed.

Database tables should also be defined, for complete newbies. If you're thinking of the kind of tables you drew in school to represent data, then you are not far wrong. Tables in databases have some additional properties to worry about. Each column has a type (character string, integer, text, date etc.) which it can be, and columns also need to know their defaults, or whether they are allowed to have entries with no data. Finally, columns can have extra special features like being a key, or auto incrementing.

This guide sometimes references the concept of primary and foreign keys. A primary key in a database table is a column (although sometimes it is more than one column) which uniquely describes that entry in the database. Primary keys are particularly useful in relational databases (of which MySQL is one) because it allows entries in other tables (or even other entries in the same table) to refer to one unique entry. Also, the database tends to be happier if it can identify entries uniquely as well. A column in a table B which references a primary key in table A is called a foreign key. The presence of primary/foreign key pairs implies a relationship between entries in the two separate tables, which is why they are important. If you understand programming, you should view the foreign key as a pointer or reference to a particular entry in a database, and the primary key as being equivalent to the location in memory.

This overview also talks a lot about the relationships between tables. This overview uses terms that are common in relational database design, and should be familiar if you're already familiar with relational databases. As these terms are not completely standard, you might need to check out Database Relationships Explained to make sure you understand the terminology being used. If you are a newbie, you should definitely check out Database Relationships Explained, as it explains the different types of relational database used in Dreamwidth (and commonly on the internet as well).

Tables by Concept

This section attempts to map the concepts you know (for example, from having used the site) with the database tables that represent them, to make finding things easy.

Users

Details about a user are held in the table Database Table: user and this forms the centre of how user information is stored about users.

There are many other tables which help in storing information about users, many (but not all) of them prefixed by the phrase user. We divide these other tables into three groups for ease of description:

  • One to one and one to many tables, which basically hold extra information about each user.
  • Link tables which link users to other concepts (like schools in the schools table, for example).
  • User Property Lists (see #Property Lists above).

Also, there is a table dealing with linking users to users (in a graph structure) called wt_edges. This is described further under the friends concept.

Extra Information

The tables holding extra information on a user are:

  • Database Table: birthdays: One to one mapping of the user's next birthday. Original birthday information is held in the user table.
  • Database Table: userbio: One to one mapping of a user biography onto a user.
  • Database Table: userlog: One to many mapping of user events (e.g. account create) onto a user, showing the time the event happened. This table logs user activity.

Link Tables to Other Concepts

These tables link the concept of a user with other concepts.

User Proplists

Database Table: userproplist and Database Table: userproplite2 form a proplist pair for defining user properties (such as the name of a twitter account).

Posts

Thanks to the history of how Dreamwidth developed, the name for a journal (or journal posts) is in fact a log inside the databases (see #BradLog for further details). Therefore, the tables starting with log hold the journal posts, and any other tables with log in the refer to journals, unless they refer to the process of logging data that DW generates (e.g. the table userlog). As you can see, there's a little room for confusion.

Posts are split across two tables: log2 and logtext2. The former defines the post and all its details (e.g. security settings), while the second one actually has the text that makes up a journal post.

There is also a proplist for individual journal entries in the pair of tables logproplist and logprop. These allow the setting of properties for individual journal entries.

Comments

Thanks to the history of how Dreamwidth developed, the name for comments is in fact talk inside the databases (and also in the name of the comment bml pages, see #BradLog for further details).

Comments are held in two tables, Database Table: talk2 and Database Table: talktext2. The actual text is held in talktext2, and talk2 holds the structure of the comment, along with who posted it.

Tags

Thanks to the history of how Dreamwidth developed, inside the databases tags are called keywords (see #BradLog for further details).

The table keyword itself is legacy and does not appear to be used anymore. Tags are instead held in userkeywords and usertags.

The table usertags appears to be a tree but holds no tag names, while userkeywords holds tag names but no structure.

Tags are mapped to individual posts in the logtags table, and also in the logtagsrecent.

Appendix

BradLog

Livejournal, the predecessor to Dreamwidth, dates from an era where the language used to describe blogging hadn't yet settled down to its current standardised form. Although most of the code used in LJ has been updated over time to follow the new terminology, the database (especially), is rife with the old terminology (with just enough of the new to throw you off), making it very confusing.

LJ grew out of a program written by founder Brad, called BradLog. In this, each entry was a log entry in a web log (blog). Users could talk about each log entry (comments) and each log entry could be described by keywords (tags).