segunda-feira, 16 de dezembro de 2013

To UUID or not to UUID?

in http://www.jejik.com/articles/2007/04/automagically_convert_python_objects_to_mysql/

To UUID or not to UUID?

April 16, 2011 · 10 Comments

A discussion that crops up from time to time is whether primary keys should be UUIDs or simple auto-incrementing integers. There are pros and cons on both sides. UUIDs provide universal uniqueness which allows data to be stored across multiple databases, merged and so on. Auto-increment primary keys are easy to work with and efficient for indexing - but cause problems when you need to deal with data created across multiple databases, such as data migration and merging. UUIDs, as 36 character strings, take up a lot more space and can be much less efficient as an index.
But what about storing UUIDs in a more efficient format? A DBA recently recommended using BINARY(16) since that holds 128-bit data and a UUID is usually decomposable as two 64-bit long integers. The question is how to do this? And, I guess, another question is: is it really that much more efficient than a 36 character string?
When I first looked at implementing the DBA's advice, I couldn't find a type incfqueryparam that would support BINARY(16). There's BLOB... and I guess that would work if I could provide a byte array? Then the question is how to get a UUID - a 36 character string of hex with '-' separators - into a byte array in the first place? And then you have to ask how efficient that would be (under the hood, the UUID generator will create two 64-bit longs and then have to convert that to a string before CFML gets it's hands on it!).
Then another issue is the raw speed of UUID generation in the first place. Adobe made some big improvements between CF8 and CF9 here (100x faster) but generating UUIDs is traditionally still quite slow and in CFML they get converted to a delimited hex string so you can't even get at the underlying binary data. When I did some research, I found that the UUID generator built into Java (since Java 5) can only produce type 3 (name-based) and type 4 (random) UUIDs - it can't create time-based (type 1) or DCE security (type 2) UUIDs. There are third-party generators that can produce time-based UUIDs and which perform better than the built-in Java version.
Overall tho' this sort of fiddling around is something better suited to a low-level language like Java. We want a raw UUID. We want to get the two 64-bit longs and write them to a 16-byte array and use that as our BINARY key. As an experiment, I tried Johann Burkard's lightweight time-based UUID generator - see below - and wrapped it up in Clojure to create the value I wanted:
(defn uuid [] (com.eaio.uuid.UUID.))

(defn uuid-as-byte-array []
  (let [u (uuid)
        ^long lo (.getClockSeqAndNode u)
        ^long hi (.getTime u)]
    (-> (java.nio.ByteBuffer/allocate 16)
      (.putLong hi)
      (.putLong lo)
      (.array))))
There may well be more efficient ways of constructing the byte array, such as direct write operations and bit manipulation, but this seems clean and simple enough.
I'd be very interested to hear what others have done around UUID and/or binary indexes on large tables and, in particular, how you've deal with that from CFML?
See Johann Burkard's comparison of UUID generators for feature / performance analysis. See also Mark Kruger's recent post about leveraging Java UUID from ColdFusion - and the performance improvements possible (and read the comments).

Sem comentários: