Proper RFC 4122 UUIDs as GUIDs in WordPress

UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier), is a string that identifies a piece of information in computer systems. WordPress use GUIDs to identify each individual post, but use URLs (kind of) for GUIDs, and thus does not follow the standard definition (RFC 4122) of a UUID (or GUID).

A WordPress GUID: https://www.bjornjohansen.com/?p=1901
A proper RFC 4122 UUID as a URN: urn:uuid:65396530-3934-5930-a563-343736343835

As you can see, the WordPress GUID isn’t even the regular permalink to a post (if you have pretty permalinks enabled, and most people do). But as slugs in permalinks may change, we need a GUID that doesn’t change. They should be immutable to work as an identifier. So it makes sense that WordPress uses that URL. This makes it is easy to mistake the GUIDs in WordPress for being URLs you can use for something else than as an ID. But the GUIDs in WordPress should never be treated as URLs. They simply are not. They are IDs. They just happen to also be URLs in the default WordPress implementation. They are not the URLs we want to expose anywhere, though.

Both URLs and URNs are URIs, but a GUID should be a URN, as it is for ID, not for location. The difference between URI, URL and URN is well explained here.

To avoid any possible confusion around if WordPress GUIDs are URLs, and to make them compatible with the UUID format that the rest of the world uses, we can use the wonderful Plugin API and hook into WordPress to use proper RFC 4122 UUIDs.

About UUIDs and versions (subtypes)

A UUID is 128 bits long, and requires no central registration process.

Adoption of UUIDs and GUIDs is widespread, with many computing platforms providing support for generating them, and for parsing their textual representation.

– WikiPedia (on RFC 4122 UUIDs)

RFC 4122 defines different versions, or subtypes, of UUIDs. Version 4 is the one that is easiest to use, as it is completely based on cryptographically random (or pseudo-random) bits. UUID v3 and v5 specifies how we can use URLs in the URL namespace as basis for UUIDs. The difference between v3 and v5 is that v3 use MD5 whereas v5 use SHA-1. SHA-1 should be used where backwards compatibility with MD5 isn’t necessary.

UUID versions (subtypes) that are interesting to us:
UUID Version 4: Based on random bits. Gives us 2^122 different combinations, which should never be an issue. Really. There are 7.38e26 possible UUIDs for each human being on the planet.
UUID Version 5: Based on a SHA-1 hash generated from the URL namespace UUID and a URL. Not even a 1:2^122 chance of a collision.

Hooking into WordPress

Because of how WordPress saves new posts, the most efficient is to use UUID v4, as they can be included when a new post SQL insert is performed.

The way WordPress by default insert GUIDs is to do an SQL update after the first insert, as the new post ID is required to create the “permalink”. If we want to use UUID v5, based on a “permalink”, there is unfortunately no filter for the GUID update, so we have to hook in a little later, where we check if the GUID is set to the “permalink” and then run yet another SQL update to set the GUID field to a proper UUID v5 string.

However, IMHO, since we are in fact dealing with articles that are assigned unique URLs (permalinks), we shouldn’t have to resort to using random UUIDs (v4). I think I’ll settle on using version 5, but it is up to you to make your own decision.

Using UUID version 4

This is the computationally most efficient, as we filter the UUID into the post field before it is inserted into the database.

To use UUID version 4 for your GUIDs in WordPress, you can add this snippet, e.g. as an mu-plugin:

<?php
add_filter( 'wp_insert_post_data', function ( $data, $postarr ) {
	if ( '' === $data['guid'] ) {
		$data['guid'] = wp_slash( 'urn:uuid:' . wp_generate_uuid4() );
	}
	return $data;
} );

And that’s really everything that’s needed!

(Thanks to Dominik Schilling for pointing out to me that WordPress introduced wp_generate_uuid4() in version 4.7, so you don’t need to bring your own implementation.)

Using UUID version 5

This is not based on (pseudo) randomness, and are truly unique, but requires two additional SQL update queries to be run after the initial insert. It should however not really be an issue in most (any?) cases.

The UUIDs are based on the URLs that WordPress use for GUIDs as default, but follows a standardized format for UUIDs as URNs, and will not be confused as URLs.

To use UUID version 5 for your GUIDs in WordPress, you can add this snippet, e.g. as an mu-plugin:

<?php
add_action( 'save_post', function( $post_ID, $post = null, $update = false ) {

	/*
	 * We’ll only update the GUIDs when inserting new posts.
	 * A GUID should never be changed for an existing post.
	 */
	if ( ! $update ) {
		global $wpdb;

		$where = array(
			'ID' => $post_ID,
		);

		$wpdb->update( $wpdb->posts, array(
			'guid' => 'urn:uuid:' . uuid_v5( get_permalink( $post_ID ) ),
		), $where );
	}
} );

Unlike UUID version 4, you need to bring your own UUID implementation (an uuid_v5() function in the example above).

UUID version 5 implementation

Here’s a ready RFC 4122 compliant implementation for UUID version 5 (name based with SHA-1 hashing). Save it as an mu-plugin, e.g. uuid.php:

<?php
/**
 * RFC 4122 compliant UUIDs.
 *
 * The RFC 4122 specification defines a Uniform Resource Name namespace for
 * UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally
 * Unique IDentifier).  A UUID is 128 bits long, and requires no central
 * registration process.
 *
 * @package UUID
 * @license https://www.gnu.org/licenses/gpl-2.0.txt GPLv2
 * @author bjornjohansen
 */

if ( ! function_exists( 'uuid_v5' ) ) {
	/**
	 * RFC 4122 compliant UUID version 5.
	 *
	 * @param  string $name    The name to generate the UUID from.
	 * @param  string $ns_uuid Namespace UUID. Default is for the NS when name string is a URL.
	 * @return string          The UUID string.
	 */
	function uuid_v5( $name, $ns_uuid = '6ba7b811-9dad-11d1-80b4-00c04fd430c8' ) {

		// Compute the hash of the name space ID concatenated with the name.
		$hash = sha1( $ns_uuid . $name );

		// Intialize the octets with the 16 first octets of the hash, and adjust specific bits later.
		$octets = str_split( substr( $hash, 0, 16 ), 1 );

		/*
		 * Set version to 0101 (UUID version 5).
		 *
		 * Set the four most significant bits (bits 12 through 15) of the
		 * time_hi_and_version field to the appropriate 4-bit version number
		 * from Section 4.1.3.
		 *
		 * That is 0101 for version 5.
		 * time_hi_and_version is octets 6–7
		 */
		$octets[6] = chr( ord( $octets[6] ) & 0x0f | 0x50 );

		/*
		 * Set the UUID variant to the one defined by RFC 4122, according to RFC 4122 section 4.1.1.
		 *
		 * Set the two most significant bits (bits 6 and 7) of the
		 * clock_seq_hi_and_reserved to zero and one, respectively.
		 *
		 * clock_seq_hi_and_reserved is octet 8
		 */
		$octets[8] = chr( ord( $octets[8] ) & 0x3f | 0x80 );

		// Hex encode the octets for string representation.
		$octets = array_map( 'bin2hex', $octets );

		// Return the octets in the format specified by the ABNF in RFC 4122 section 3.
		return vsprintf( '%s%s-%s-%s-%s-%s%s%s', str_split( implode( '', $octets ), 4 ) );
	}
}// End if().

One last word of caution

Please do not think that UUIDs have anything at all to do with security. Do not use it as such.

Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example.

– RFC 4122

7 Comments

  1. You’ve done a mistake with your use of UUIDs as replacements for the ID. The IDs in WordPress are URIs for a reason: The Atom syndication standard (and some versions of RSS) requires the ID to be an URI (so either a URL or URN). WordPress uses the GUID as-is in the RSS and Atom templates, so you must store the GUID in WordPress’ database as an URN. Other metadata formats also expects IDs to be URIs.

    Luckily, there is a standard URN that uses UUIDs: “urn:uuid:$lt;an-uuid>”.

    I’ve made a plugin that generates UUIDv4 URNs for use as GUIDs in WordPress. It’s creatively named urn:uiuid as the_guid.

    1. Thanks for bringing this to my attention. I’ve now updated the post to include a filter to output proper URNs when outputting GUIDs that are RFC 4122 compliant UUIDs.

      I’m not sure if get_the_guid is the right filter to use though. Perhaps it is more correct to filter in clean_url which is applied in esc_url() (which is applied in the the_guid filter).

      1. I’ve updated the post again, to insert the UUIDs with the protocol prefix into the DB, and allowing the url protocol in WordPress.

        1. Yet another update now, since Dominik Schilling pointed out that since WordPress 4.7 wp_generate_uuid4() has been available and urn an allowed protocol.

          1. I was aware of urn being added to allowed protocols (my patch:) but I wasn’t aware there was a new function to generate UUIDs. Thanks for discovering it and sharing!

  2. I believe there is a mistake in the implementation at the following line:

    // Compute the hash of the name space ID concatenated with the name.
    $hash = sha1( $ns_uuid . $name );

    This concatenates the namespace UUID represented as a string ($ns_uuid) with the supplied name ($name). However looking at the sample implementation in RFC4122 (the routine uuid_create_sha1_from_name()) , the namespace UUID is to be represented as a packed sequence of integers with the bytes within the fields of the namespace UUID in network byte order.

    Unfortunately I’m not very well versed in PHP or I would provide corrected code. But it amounts to splitting up each of the first three chunks of the UUID string (maybe with explode(‘-‘,…)?), converting those strings to binary with hex2bin(), converting those binary values to network byte order with with pack(), and then converting the remaining portions of the hex string to binary with hex2bin(), then concatenating the whole mess into one long set of bytes with pack(). Finally that set of bytes can be run through sha1().

    Hope that’s helpful.

    1. Oops… I forgot to mention with this method you’ll have to concatenate the bytes of $name (as-is, no packing/etc), before passing the whole thing through sha1().

Comments are closed.