Taxonomy rails app to-do list

Now that I’m familiar enough with Ruby on Rails to tweak various things, it’s time to get down to business and actually code the part that will make this thing useful. My goal is to be able to dump in a bunch of taxonomy data and spit out a tidy overview of the total hits in each category, with summaries for the top-level categories.

Sample input:
HH-0500.0500 HH-0500.2500-250 HH-0500.2500-250 HH-0500.8000-150 HH-0500.2500-250 HH-0500.8000-150 HH-0500.8000-150 HH-0500.0500 ND-6500.9800 FT-3000.1700 HH-0500.2500-250 HH-4500 HH-4500 HH-4500.0500 HH-0500.2500-250 ND-3500.3600 HH-4500.0500 HH-0500.2500-250 HH-0500.8000-150 HH-0500.2500-270 HH-0500.8000-150 HH-0500.8000-150 HH-0500.2500-250 YF-4500 HH-0500.0500 HH-4500 HH-0500.2500-270 HH-4500 HH-0500.2500-250 BD-1800.2000 HH-0500.0500 HH-0500.2500-250 HH-0500.2500-250 HH-0500.2500-250 HH-0500.2500-250 LV-1600 BM-6000.1500 HH-0500.2500-250 HH-4500 HH-0500.2500-250 HH-0500.2500-250 HH-0500.2500-250 HH-0500.2500-250 HH-0500.2500-250 HH-0500.2500-250 HH-0500.2500-270 HH-0500.2500-250 HH-0500.0500 HH-4500 HH-0500.2500-250 BD-1800.2000 HH-0500.0500 HH-0500.2500-250 HH-0500.2500-250 HD-6000.6200 HH-4500 BM-3000.2000 TI-1800.3000-200 HH-0500.2500-250 HD-6000.6200 HH-0500.8000-150 HH-0500.2500-250

Sample output (based on the taxonomy category names that correspond to the codes – this is the AIRS/211 Taxonomy of Human Services):
Sorted and labeled taxonomy data table

So far this is what I have:
App Screenshot

The model is backed by a super basic database with fields for titling the raw data or taxonomy to be parsed, dumping the raw data in there, and capturing a timestamp for when it was uploaded. I added a field for where the pretty parsed data can live, but I have to figure out how to
1. actually pass the raw data off and chomp it
2. write methods that can reliably handle any code in the taxonomy and based on the first letter of the code, generate a total for a top-level category
2. write methods that can match any code in the taxonomy to its category name and display that name and its total
3. pass all of this chomped data back to the database and update the appropriate database record with the parsed data



Interesting readings from around the web:

“Because wide-spread full text indexing abounds, the problem of find is not as acute as it used to be. In my opinion, it is time to move away from the problem of find and towards the problem of use. What does a person do with the information once they find and acquire it? Does it make sense? Is it valid? Does it have a relationship other things, and if so, then what is that relationship and how does it compare? If these relationships are explored, then what new knowledge might one uncover, or what existing problem might be solved? These are the questions of use. Find is a means to an end, not the end itself. Find is a library problem. Use the problem everybody else wants to solve.”
Eric Lease Morgan, “Next-generation library catalogs, or ‘Are we there yet?’”

“My favorite worlds have always been natively game-like. In their basic world rules you immediately want to interact with them. When you know that Anne McCaffrey’s Pern has five types of colored dragons, you immediately want to match yourself to one. When you know that in Piers Anthony’s Xanth every person has a unique magical talent, you want to pick out a talent for yourself. These rule structures are very game-like and enhance the poetry of a world. In addition to making it accessible, they give you a framework that exposes the theme and meaning in a world much more clearly than worlds that do not have these structures. Character classes are extremely powerful things.”
author and game designer Erin Hoffman in an interview with Clarkesworld Magazine

“It’s strange, but start talking to hard-bitten, seasoned executives about information in the enterprise and they automatically switch off their critical faculties. They’ll believe anything. Really. Like, information and how it is used in your organisation can be understood by a piece of software, out of the box. Like, you don’t need to actually understand your information environment in order to manage it. Like, the best people to ask about making your information generally accessible, are narrow subject matter specialists. Like, you can fix your information environment once, and it’ll stay fixed forever without paying any more attention to it. In this article we explore three fairy tales about taxonomies that executives seem particularly prone to believing:

1. That you don’t need taxonomies if you get a good search engine;
2. That taxonomies can look after themselves or can be delegated piecemeal to non-taxonomists;
3. That the best people to advise on taxonomy development are subject matter experts.”

-from Innotecture, citing Taxonomy Times No. 6, April 2011

Weekend update

I finally uploaded the annotated bibliography I wrote for one of my classes. It’s on ancient (mostly archaic and classical) Greek art & archaeology. The assignment required us to find a certain number of resources in specific formats, so there’s a lot fewer web resources than I would include in a bibliography I was doing on my own. In fact, I’ve been thinking it would be fun to do a webliography of all the fun and creative online projects I’ve come across in this field. There’s the vast world of 3-D archaeological site modeling, and then there’s all sorts of online exhibitions, image collections, and digital libraries. I have many such sites bookmarked, but I’m sure they’re just the tip of the iceberg. Before I do this I’ll have to see if someone else has already done it.

I’m also working on posts about personal finance resources and an epic overview of my favorite sci-fi books from the past 10 years. (meaning ones I’ve read since 2000, not ones that have been published since then). I’m taking an online workshop on taxonomies and controlled vocabularies through Simmons College, so things might be dull around here until that’s over. I do plan to keep doing the visual LCSH roundup, though, because it’s entertaining.

I’m also ruminating on how to possibly create some simple yet helpful document on entrepreneurship that could be mailed to prisoners requesting information on the topic. A zine would be great, but the postage might overwhelm. It seems there is definitely a need for some easily distributable resource on this topic, at least in Pennsylvania.


“Popularity is not a semantic structure”

I just read this great article by Tom Reamy in KMWorld. It’s about popular (and widespread) misconceptions about taxonomies and folksonomies. I loved the attitude in this piece, and it expressed frustrations I’ve had with peoples’ blind love of folksonomies, and with the misconception that hierarchical classification systems maintained by experts are an outdated effort that only librarians still care about. Reamy emphasizes how a hybrid approach, using a taxonomy and user-generated terms, is where the real value lies. He also points out the myth that folksonomies allow us to break free from the authority of “those dictatorial librarians”:

…folksonomy sites do have a central authority, and it is the most oppressive and most dangerous type of central authority there is — the authority of the majority. Against the will of the people, there is no recourse, no way of insuring the rights of the minority[…] It seems to me that having a system in which there is a central group of authorities or librarians that you as a minority can appeal to might work better than letting the collaboratively emergent dictatorial majority unconsciously ride roughshod over the minorities.”

The other thing I wish people would shout from the mountaintops is that the LCSH is (are?) not a thesaurus. Mary Dykstra says it best in her 1988 rant in Library Journal: just because LC decided to use the terminology of thesauri (RT, BT, NT, UF) doesn’t mean the semantic relationships between the headings are on par with those in real thesauri. Citing the 1974 ISO standard on what constitutes hierarchical relationships between terms, Dykstra uses the example of the heading:
NT Cookery (Oysters)

In LC’s defense, this subheading doesn’t appear to exist anymore. I checked some of Dykstra’s other examples:

Proposal writing in business
BT Contracts, letting of
(Contracts, letting of is now an RT, and the BTs are Business and Business writing)

NT Television and children
(still in there)

Here’s one I found:
Fortune-telling by Chinese characters
BT Chinese characters

Fortune-telling and Chinese characters are different types of entities. Fortune-telling by Chinese characters is not a type of Chinese character.

A lot of these issues stem from the insane degree of pre-coordination in the LCSH. Headings often represent multiple concepts, while in thesauri, terms represent only one concept. “With the use of a thesaurus, several terms (analyzed) may be strung together (synthesized) according to syntactic rules to form a subject” (Dykstra, 1988). I’m not saying LCSH isn’t useful and that it’s not currently serving many of us (relatively) well. It’s just frustrating that many people seem to think the LCSH is representative of thesauri in general. Reamy makes a similar point at the beginning of his article, but his complaint is with the use of the term “taxonomy”, not “thesaurus”:

A fundamental flaw in the vast majority of articles on folksonomies and taxonomies is the almost universal use of the Dewey Decimal System (or Library of Congress Subject Headings) as the example taxonomy. Using the Dewey Decimal System as your example taxonomy shows that you have no understanding of taxonomy creation and use in today’s world.

This brings me to a question that keeps bothering me. What really is the difference between a classification system, a taxonomy, a thesaurus, and an ontology? A nice set of definitions is available on this Hedden Information Management site (the creator teaches at Simmons College). These are things I need to have burned into my brain, especially if I’m going to avoid being led astray by the many instances of the terms being erroneously used interchangeably. These tools are too important to be so confused with one another, especially by professionals.


Dykstra, M. (1988). LCSH Disguised as Thesaurus. Library Journal 113 (4): 42-46.
Reamy, T. (2009). Folksonomy Folktales. KMWorld 18 (9): 6-8. <>