Difference between revisions of "XML Format"
(3 intermediate revisions by the same user not shown) | |||
Line 148: | Line 148: | ||
<link>product_1</link> | <link>product_1</link> | ||
<link weight="3">product_2</link> | <link weight="3">product_2</link> | ||
+ | <link type="purchase">product_3</link> | ||
</source> | </source> | ||
− | The link element, surprise, surprise — indicates a link between two items. The | + | The link element, surprise, surprise — indicates a link between two items. The identifiers used in the text, here ''product_1'', ''product_2'', and ''product_3'' are just the IDs of items in the database. |
'''Links to items that do not exist will be ignored unless those items are defined in the same import.''' In other words, if you have a link to an item that's defined lower in the same batch of XML, no problemo. But if you create a dead link, that just gets ignored. | '''Links to items that do not exist will be ignored unless those items are defined in the same import.''' In other words, if you have a link to an item that's defined lower in the same batch of XML, no problemo. But if you create a dead link, that just gets ignored. | ||
− | |||
− | |||
=== weight attribute === | === weight attribute === | ||
Line 166: | Line 165: | ||
<link weight="5">product_1</link> | <link weight="5">product_1</link> | ||
</item> | </item> | ||
+ | </source> | ||
+ | |||
+ | === type attribute === | ||
+ | |||
+ | This specifies a type for the link. ''Link types'' can be imagined as a set of graphs superimposed upon each other. For example, in an e-commerce settings, these could be ''purchased'', ''rated'', ''category'', etc. | ||
+ | |||
+ | You can then mix and match these ''types'' in recommendations — for example if you wanted to show related products based mostly on purchase, but also considering category information, you can specify that at query time. See our [[REST API]] for details on how to query link types. | ||
+ | |||
+ | So, if we want to say ''user_1 purchaseds product_1 and product_2 gave product_1 a rating of 5'' that looks like this: | ||
+ | |||
+ | <source lang="xml"> | ||
+ | <item id="user_1"> | ||
+ | <link type="purchase">product_1</link> | ||
+ | <link type="purchase">product_2</link> | ||
+ | <link type="rating" weight="5">product_1</link> | ||
+ | </item> | ||
+ | </source> | ||
+ | |||
+ | === ''weights'' vs. ''types'' === | ||
+ | |||
+ | There is both a semantic and practical different between weights and types. | ||
+ | |||
+ | * '''Weights''' are used to represent things that are intrinsic to the data, but where the links being related are categorically similar. Ratings would be one notable example, repeat purchases would be another. | ||
+ | |||
+ | * '''Types''' are used to specify different categories of relationships. For things that are categorically different, types have the advantage that they may be mixed and matched at query time rather than needing to be encoded, as with weights, at the time the data is uploaded to our service. | ||
+ | |||
+ | == preselected element == | ||
+ | |||
+ | This is a simple way to specify that another item should always appear in the recommendations returned for this item. | ||
+ | |||
+ | <source lang="xml"> | ||
+ | <preselected>product_1</preselected> | ||
+ | </source> | ||
+ | |||
+ | == blacklisted element == | ||
+ | |||
+ | This is a simple way to specify that another item should never appear in the recommendations returned for this item. | ||
+ | |||
+ | <source lang="xml"> | ||
+ | <blacklisted>product_1</blacklisted> | ||
</source> | </source> | ||
Latest revision as of 11:16, 7 March 2014
So, keeping in mind all of the basic structures from API Concepts, let's see what they look like in XML.
There are three major cases in which we'll get or send XML to the Directed Edge REST API:
- Import / Export - this is just a list of all items with all of their links, properties and tags included.
- Updates - you can update single items in the database or add or remove links, properties and tags.
- Queries - for related or recommended items (there's a difference).
The first two basically have the same form, with the notable difference that imports and exports contain multiple items, whereas updates only contain one item. Queries contain a list of item identifiers that are related to the query item.
Contents
Import / export example
<?xml version="1.0" encoding="UTF-8"?>
<directededge version="0.1">
<item id='user_1'>
<link>product_1</link>
<tag>customer</tag>
<property name='last name'>Schmidt</property>
<property name='first name'>Bob</property>
</item>
<item id='product_1'>
<tag>product</tag>
<property name='artist'>Beatles</property>
<property name='name'>White Album</property>
</item>
</directededge>
Here we've got a customer, named Bob Schmidt that's connected to an album called The White Album by Beatles. So we've got two items and one link. Let's start breaking this down a little.
If you imported this to your Directed Edge account using the REST API, you'd have a database with two items. Similarly, if you requested a database export right after that, this is what you'd get back.
Before you start sending XML to our server, we recommend passing it through xmllint or similar to make sure that the XML is valid. The most common problem that we see with users that are new to the API is sending invalid XML and then our servers choking on it.
Related / recommended query example
- Results for things related to "Web 2.0" in our Wikipedia database
<?xml version="1.0" encoding="UTF-8"?>
<directededge version="0.1">
<item id="Web 2.0">
<related>Ajax (programming)</related>
<related>Delicious (website)</related>
<related>Flickr</related>
<related>Web service</related>
<related>RSS</related>
</item>
</directededge>
The result structure here is pretty self-explanatory — the item is the one which was queried for and the list of results contains the identifiers of the related items. Here, notably, the order is important. Items near the top of the list rank higher than items further down.
The only difference in the XML between an related and recommended query is the name of the element that the results are wrapped in.
Update examples
- Do a complete update on the item user_1
<?xml version="1.0" encoding="UTF-8"?>
<directededge version="0.1">
<item id='user_1'>
<link>product_1</link>
<tag>customer</tag>
<property name='last name'>Schmidt</property>
<property name='first name'>Bob</property>
</item>
</directededge>
This is the same format as we used for an import except that there's only one item. As noted in the REST API if this is PUT to an item it will overwrite all existing data.
- Add or remove the tag "user" from an item
<?xml version="1.0" encoding="UTF-8"?>
<directededge version="0.1">
<item>
<tag>user</tag>
</item>
</directededge>
This is basically just a snippet from a usual item. One notable difference is that the item id may (but doesn't have to be) omitted for updating an item (using the add or remove methods noted in the API docs).
- Remove the property "last name" from an item
<?xml version="1.0" encoding="UTF-8"?>
<directededge version="0.1">
<item>
<property name="last name"></property>
</item>
</directededge>
Similar to adding a tag, when removing a property it's worth noting that the value of the property is completely ignored.
XML header
<?xml version="1.0" encoding="UTF-8"?>
Good old garden variety XML header. We prefer UTF-8, but we shouldn't blow up if you send us other stuff.
directededge element
<directededge version="0.1">
Nothing too exotic here for the moment. This wraps up all of the stuff that you'll send our receive for the moment. Version is always 0.1 for the moment because, well, we're too lazy to bump the version. (Though we'll do better versioning once we're out of beta.)
item element
<item id='user_1'>
An item is the container for all of the stuff in our database. Exactly what items are is explained in API Concepts.
id attribute
Every item must have an id attribute. That's the handle that we use to refer to the item everywhere. If you want to update or delete the item, you refer to it by its id. You can use any scheme that you want to for creating item identifiers, so long as they're unique. Often something like type_number is convenient.
tag element
Tags are always a child element of an item. Tags can be free form as well. Usually they're things like user, product, page, but they can also be much more specific like, artist, album, sci-fi, etc. You can associate as many tags as you would like with a particular item.
property element
<property name='last name'>Schmidt</property>
Properties are a set of key-value pairs associate with an item. Again, they are free form. You don't need to insert all of the values that you store locally in your database, in fact, most of the time you don't need to associate any at all. These are only useful if you later would like to grab data from the Directed Edge REST API about the items that you're storing there. You can associate as many properties with an item as you like, so long as their names are unique.
name attribute
Every property must have a name attribute. This can be anything that you like, but there may only be one property with a given name per item.
link element
<link>product_1</link>
<link weight="3">product_2</link>
<link type="purchase">product_3</link>
The link element, surprise, surprise — indicates a link between two items. The identifiers used in the text, here product_1, product_2, and product_3 are just the IDs of items in the database.
Links to items that do not exist will be ignored unless those items are defined in the same import. In other words, if you have a link to an item that's defined lower in the same batch of XML, no problemo. But if you create a dead link, that just gets ignored.
weight attribute
The weight attribute is what we use for stuff like rankings. Weights can be in the range of 1 to 10 (or zero if you want to explicitly say there's no weight).
So, if we want to say user_1 gave product_1 a rating of 5 that looks like this:
<item id="user_1">
<link weight="5">product_1</link>
</item>
type attribute
This specifies a type for the link. Link types can be imagined as a set of graphs superimposed upon each other. For example, in an e-commerce settings, these could be purchased, rated, category, etc.
You can then mix and match these types in recommendations — for example if you wanted to show related products based mostly on purchase, but also considering category information, you can specify that at query time. See our REST API for details on how to query link types.
So, if we want to say user_1 purchaseds product_1 and product_2 gave product_1 a rating of 5 that looks like this:
<item id="user_1">
<link type="purchase">product_1</link>
<link type="purchase">product_2</link>
<link type="rating" weight="5">product_1</link>
</item>
weights vs. types
There is both a semantic and practical different between weights and types.
- Weights are used to represent things that are intrinsic to the data, but where the links being related are categorically similar. Ratings would be one notable example, repeat purchases would be another.
- Types are used to specify different categories of relationships. For things that are categorically different, types have the advantage that they may be mixed and matched at query time rather than needing to be encoded, as with weights, at the time the data is uploaded to our service.
preselected element
This is a simple way to specify that another item should always appear in the recommendations returned for this item.
<preselected>product_1</preselected>
blacklisted element
This is a simple way to specify that another item should never appear in the recommendations returned for this item.
<blacklisted>product_1</blacklisted>
<related>user_1</related>
Totally no frills. Results from a related items query. Order matters.
recommended element
<recommended>product_1</recommended>
Same deal as the related element, but for the results of recommended queries.