So you think HTML is hard? Try DICOM!

I’ve recently been reading about the hard to understand HTML and the ‘fine line’ there is to producing HTML output that all (enough?) web browsers understand.  To all web developers I say quit you’re whining, try following the DICOM standard.  Although I haven’t read enough standards to say that DICOM is the worst standard ever, I can say that it can be a tad confusing at times.

Some problems:

  • The standard is long.  The PDF version of HTML4 is nice and light at a single 389 page document, the 2008 DICOM standard has 18 parts, part 3 alone is 1097 pages. ( OK, this is a bit of comparing apples to oranges since DICOM covers communication protocols in addition to the syntax…)
  • The standard is old.  The ARC/NEMA standard dates back to the mid-eighties.  There’s nothing wrong with this per say, but it does carry around a lot of baggage (like transfer protocols that so far as I can tell, no one uses anymore).  Life would be so much easier if implicit little endian would just go away *sigh*.
  • The standard is ambiguous.  There’s lots that’s unclear or unspecified.  What’s the correct window/level (brightness/contrast if you’re not familiar with DICOM). You’d better look outside the standard to figure it out.
  • The standard is confusing.  What’s the pixel spacing in an x-ray image?  Try reading figuring out what this means.
  • Major vendors don’t comply to the standard.  It’s not enough to comply with the standard, like web browsers you’ve got to be bug compatible with common vendor formats.  Unlike Web browsers, there’s not just a handful, there’s dozens of them, and you may not have their data.  This is what David Clunie calls the “we know what the standard says but we are going to ignore it and do what we have been doing for almost a decade regardless” CR vendor bug.

That said, at least there is a standard.  When I first started working a Merge (formerly Cedara Software) there was a SIF (Scanner InterFace) team who’s job it was to read the proprietary formats from older scanner and read them in.  We still sell boxes that connect to old ultrasound machines to allow them to produce DICOM outputs.  Despite being old, the standard is still extensible (though it’s going a little too far IMHO).  Finally, DICOM also has a pretty good domain model for radiology.  I don’t know what came first, the domain model or DICOM, but structuring your application around the DICOM model has worked very well for me.


One thought on “So you think HTML is hard? Try DICOM!

  1. Peter, thanks for posting this – you saved me a post :).
    I wrote (and meant) “the worst standard *i came across*” but there’s actually a fair chance it’s indeed the worst one ever standardized.

    I once came across the opinion that what makes it so hard is the same thing that made it so successful: it started out as a conglomeration of very different protocols used by quite a few giant vendors. The only way to get such mess under the umbrella of a single standard was for the standard to *allow* many diverged implementation choices, at various junctions.

    While this thought does have some ground (e.g., in transfer syntax negotiation), I don’t think it’s the *main* difficulty. I now think it is just very, very, poorly written. For one, it is packed with legally-phrased introductions, that serve no purpose other than to wear out the reader. Random example:
    “In order to serve as an SCP of the Query/Retrieve Service Class, a DICOM AE possesses information about the Attributes of a number of stored composite object SOP Instances. This information is organized into a well defined Query/Retrieve Information Model. The Query/Retrieve Information Model shall be a standard Query/Retrieve Information Model, as defined in this Annex of the DICOM Standard.
    Queries and Retrievals are implemented against well defined Information Models. A specific SOP Class of the Query/Retrieve Service Class consists of an Information Model Definition and DIMSE-C Service Group. In this Service Class, the Information Model plays a role similar to an Information Object Definition (IOD) of most other DICOM Service Classes… ” etc. etc. etc. etc.

    For two, the standard designers apparently had an extreme affection for classifying stuff. Whenever the standard acknowledges 4 types of widgets, They are grouped into 2 ‘widget object entity definitions’, each definition containing exactly two widgets, and separately characterized by 2 ‘information widget model factors’. These two atrocities are tediously defined over 1.5 pages, and add up to a neat table like –
    Widget | WODT | IWMF
    1 | 1 | B
    2 | 1 | L
    3 | 2 | L
    4 | 2 | B*

    * this is in fact a different IWMF, should have really been labeled Q.

    Almost nowhere in the standard, can you just find the section about the widget you desire and just read what it does. You typically need many, many pages of useless terminology to be able to read the info bit you seek. Pop quiz: can you understand the distinction between normal and composite service classes? (I sure can’t). If you do, can you see any value in this distinction?

    Seems to me that the task undertaken by, say, the c++ standard designers – is far more complex, and they produced an infinitely more readable document, at about 25% of the pages. On the brighter side, this quality of a standard keeps the jobs of many a DICOM specialist tightly secured :).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s