Serpent Addiction

Wednesday, November 09, 2005


I have some information that doesn't quite fit into a relational database model, though it would be convenient to store it as such. Here's an example:

>>> data = {'therapies': ['4'], 'inpatient_diagnoses': [('751.1', 'ATRESIA SMALL INTESTINE')], 'prior_conditions': ['3'], 'diagnoses_and_severity': [('879.2', 'OPN WND ANTERIOR ABDOMEN', '2'), ('751.1', 'ATRESIA SMALL INTESTINE', '4'), ('751.2', 'ATRESIA LARGE INTESTINE', '2')], 'race': ['6'], 'regimen_changed': '1', 'current_payment_sources': ['1'], 'overall_prognosis': '1', 'high_risk_factors': ['2','5'], 'inpatient_discharge_date': 2005-02-19, 'financial_factors': ['0'], 'rehab_prognosis': '1', 'inpatient_facilities': ['1'], 'changed_diagnoses': [('751.1', 'ATRESIA SMALL INTESTINE')], 'life_expectancy': '0'}

It's a combination of lists, strings, and associative arrays, supported by most modern languages. One goal is to keep the data human-readable, which precludes some kinds of binary serialization. The data could be represented as XML, but XML's just barely readable.

Matt Goodall offered the following suggestions:
I recall playing with the Python version of YAML over a year ago. It was pretty cool, though slow, and not a complete implementation. There's a certain appeal to a Python programmer in representing data with whitespace. For example, an invoice:

invoice: 34843
date : 2001-01-23
bill-to: &id001
given : Chris
family : Dumars
lines: |
458 Walkman Dr.
Suite #292
city : Royal Oak
state : MI
postal : 48046
ship-to: *id001
- sku : BL394D
quantity : 4
description : Basketball
price : 450.00
- sku : BL4438H
quantity : 1
description : Super Hoop
price : 2392.00
tax : 251.42
total: 4443.52
Late afternoon is best.
Backup contact is Nancy
Billsmer @ 338-4338.

I couldn't google up an emacs mode for .yml files, but I'm sure someone must have written one by now. Ultimately though, I'd prefer to keep embedded newlines out of the field data, so I next looked at JSON.

>>> import json
>>> s = json.write(data) # data is returned as a string.
>>> d = # converted back into a dictionary.

Pretty simple.

JSON wins out over YAML for this project. It was dirt simple to round-trip the example data. Speed isn't much of an issue for this particular use of JSON and I like storing the data as TEXT rather than BLOB.

There is also JSON-RPC, a remote procedure call protocol similar to XML-RPC. Interesting.


Post a Comment

<< Home