Directed Edge's recommendation engine is built on top of our in-house database. The database works somewhat differently than traditional databases and its interface is our REST API, so it's useful before getting started to have a feel for how to go about translating things from the way that your site models data to the way that our database works.
Everything is an item, or A brief introduction to schema-less databases
Users, products, places, web pages — they're all items for us.
Really? Yep. Above are a couple of examples of things that could be items in our database.
Items have an identifier (user_1 and product_1 in the example above). That can't be changed once it's imported to the database. It's the handle that we use to grab a hold of things later on. These must be unique. Otherwise the database doesn't know how to tell two items apart. You can use any scheme that you like to encode these. Most people just use something like the type of the object (product, for example) and then the numeric ID that's stored in their SQL database, but you could also use the URL to a page here or anything else that makes sense for your data.
Items can also have a collection of text tags. These are useful in filtering results later on. If you, for instance, wanted to find all of the books that are recommended for a certain user, it'd be useful to have some items tagged as book. You can also give an item as many tags as you want, so it's no problem to have product, book and sci-fi all associated with the same item.
Note however, that tags are just ways that we can label items; they're not actively used in finding similar items.
Properties are almost as simple. They're just a bunch of key-value pairs that are associated with a given item. Unlike in a traditional database where you'd have a schema for your users table, another one for your products table and another one for ... well, whatever else you wanted to store, properties in the Directed Edge database are completely ad-hoc.
Just to get the gears turning in your head, examples of things that can be stored in properties are: price, address, first name, last name, author, etc. Pretty much all of the standard fare that would go into a database column.
If you've got some products that have a property named price and some that don't, that's just fine with us.
At the moment properties aren't used at all by the recommendations algorithms, but we may add support for doing that later. For now they're just a convenient way of keeping tabs on the information that is being recommended.
Links, or the way you connect stuff
So, we're huge graph theory wonks. No, no, not like Excel. Like webs of information.
In computer science lingo, a graph is just a sets of connections between items. Those connections are called edges and the items are called nodes. All sorts of things can be modeled in graphs, like the way that web pages are connected to each other (there the links between pages are edges), or as above, a set of friends in a social network.
Let's make this really easy just to be clear: a graph is just a collection of connected stuff.
Here's an example of the two items from above connected via an edge. This would indicate, for instance, that the user had bought that product.
An edge is just the math term for what we call a link. They're the connections.
Every object in the Directed Edge database is an item; the relationships between them are links.
- A user (item) purchases a product (item), so we make a link from the user to the product.
- A user (item) gives a rating of 5 to a product (item), so we create a link between the user and the product with a weight of 5.
- A user (item) clicks on a product category (item), so we make a link from the user to the category page.
- A user (item) is friends with another user (item), so we make a link from the user to the other user.
- A user (item) is a fan of a band (item), so we make a link from the user to the band.
- A web page (item) is connected to another web page (item), so we make a link from the first page to the second page (just mimicking the HTML link structure).
So when getting up and going with the Directed Edge system, you have to decide which bits of information you want to base the recommendations on.
You notice up there that those arrows aren't bi-directional? They don't have a pointy thing on both sides? Well, guess where the name of our company comes from: those are directed edges. In graph theory you can have undirected edges and directed edges. All that directed means here is that they have a direction. A connection from A to B is not the same as a connection from B to A. One says, Bob is a friend of Sarah and another implies Sarah is a friend of Bob.
Web page A can link to web page B without implying that web page B also links back to web page A. That link is a directed edge.
It's the same kind of deal when a user buys a product. We usually want to represent a user bought a product rather than implying that the product has some sort of fundamental connection with the user (though it might, for instance, if the user was also the author of it).
The only typical use case where this matters in using the API is that if you have friends in a social network, and the friendships are reciprocal (like Facebook) rather than possibly uni-directional (like Twitter), you have to put two links in there to convey the friendship.
We've also introduced the notion of typed links. Typed links specify something like:
- The relationship between this customer and this product is purchase.
Link types can be mixed and matched at query time. So if you're asking for related products you can ask for a mix of recommendations based 80% on purchases and 20% on common tags by saying that you'd like 0.8 of the purchase link type and 0.2 of the tag link type.
You can create whatever link types seem appropriate for your site's data.