Press "Enter" to skip to content

Roger Hyam

DIY Book Scanner: Learn By Doing

Simple Scanner
Almost Free Scanner

In the last weekend of the Christmas break I was sat in Starbucks in Waterstones in Edinburgh considering which of a stack of potential books I was going to spend my Christmas book tokens on. I had just been playing with a Sony eBook reader and so was thinking maybe I should take the plunge and go digital with books as well as the rest of my life.

I wondered what I would do with my existing books. It would be nice to be able search through these and have them all with me when I travel. There would be issues with copyright if I were to copy them but there would also be technical problems. How would I get them in EPUB or PDF format? I did some Googling and came across a great site diybookscanner.org. There are some really innovative designs on this site and it got my obsessive thoughts going. There were two problems.

  • I only had 48 hours to play before going back to work and my wife and kids wanted some of that time.
  • I didn’t have a workshop. Just a desk and some simple tools.

Could I produce a scanner in that time? Would it work?

When will eBooks stop being a rip off?

3374-originalAmazon are selling an ebook of Siddhartha by Herman Hess for the Kindle for $3.51 and it appears in different versions for even more. Siddhartha is out of copyright so it costs them nothing for the rights on this book. The $3.51 is all for them.

Does this mean that $3.51 is the cost of distributing an eBook through the Amazon system? That would imply that the publisher (nee the author) would get the value of any ebook that retailed for over this sum. With Zen and the Art of Motorcycle Maintenance by Robert M. Pirsig (which retails for $9.58 on Kindle) for example the authors would get $6.07? Somehow I doubt it!

That price tag of $9.58 doesn’t compare very well with $10.19 for the paperback version of Pirsig’s book.  The Kindle version can be yours in 60 seconds or less but it is controlled by Digital Rights Management (DRM) so really all you are buying is the right to have a permanent relationship with Amazon who will supply you with a copy to read on an authorised device. For 61c more you could have one made out of real paper that you could hand on to a friend or loved one, sell, donate to charity or even burn to keep warm. Sure it won’t last forever but it still has a residual value. My paper copy is yellowing but perfectly readable. It was printed in 1978 (that is 32 years ago!). It has a price tag of £1 and I bought it from a second hand shop for £1.50 ($2 ish) about 10 years ago.

The Present Moment Does Not Exist

It is just past Christmas and the turning of the decade so I thought it would be worth capturing a train of thought on time and space.

  • The future doesn’t exist yet.
  • The past no longer exists.
  • The present moment is vanishingly small.

Consider the sounds you hear in a piece of music. Sound is the changing in air pressure that moves our ear drums backwards and forwards. To hear Middle C we need to listen to a sound for a long enough period to judge that the air pressure is changing around 261 times per second. At any one moment our ear drums are stationary. There is no sound in the now.

Biodiversity Informatics – A ‘sackable offence’

Frankenstein's Monster Required tremendous energy to re-animate.
Tremendous energy is required to re-animate the dead.

At last month’s TDWG2009 conference I was on a panel for a brief discussion at the end of a session. There were around 200 people in the audience and handful of us up front as lambs for the slaughter.

One of the questions from the floor concerned the automation of the taxonomic process. I don’t recall the precise question but it triggered one of my (probably boring) canned responses.

I pointed out that the usual practice in software engineering, when asked to automate a system, is to produce a Domain Model based on an analysis of some Use Cases that then leads on to some Object Model or implementation model that is actually created in software. The assumption behind this is that whatever was being done was good but needs to be done faster – with computers!

In biodiversity informatics, and particularly in biological taxonomy, this is not such a good idea. Current working practice was developed in the light of the prevailing technology of the time. If computers and the internet had been available from the start things would probably have been done differently. The worst thing we can do now is automate a paper based system.

Synonyms Are SubClasses And Higher Taxa Are Just Tags

strict_baptist_chapelI have been wrestling for some time with how to handle taxonomic hierarchies when combining multiple classifications. This is partly motivated by a pressure to produce consensus hierarchies for navigation (a task that I think is probably not worth doing but which is beyond the scope of this post) and partly from a need to carry out inference over multiple classifications using OWL (something that I think is an important research topic if we are to overcome the ‘taxonomic impediment’).

Take the simplest scenario where we have classification C1 that contains family Z with two genera X and Y that contain a total of three species Xa, Xb and Yc. Now let there be another classification C2 that is identical but for the species Xb being moved to the genus Y as Yb.

Mindfulness and Mental Health – a glimps of the madness?

I have just come to the end of “Mindfulness and Mental Health: Therapy, Theory and Science” by Chris Mace. My motivation for tackling such a book is to learn more about the link between mindfulness meditation and the mental health/psychotherapy field.

The book has been an interesting but challenging read. I have a scientific training and I am a regular Buddhist meditator but I have little experience of the world of psychoanalysis/psychotherapy and other talking therapies. I saw the book as a way to glimpse into that world. Having persevered to the bitter end I feel I do have a clearer understanding of the field but that it is not a positive one.

Episode 987: Cabinets – A Taxonomic Soap Opera.

Amanita muscaria (fly agaric)In this episode of our longest running soap opera Terry & Tina confuse Eric who takes off with Malcolm.

“Cabinets” is a public service broadcast with the aim of promoting  community understanding of complex taxonomic issues.

— Cue opening credits —

The story so far:Terry is a taxonomist and he works very hard to produce a classification of the family Z. It includes two genera, X and Y and three species A, B and C. Here is a picture of his classification.

Managing The Managing Of The TDWG Ontology

Castle CampbellSeveral years ago I was involved in the developing the “TDWG Ontology”. Quite what the TDWG Ontology was/is remains an enigma for many. Around 2005/6 we tried to move away from modeling things in XML Schema and into some form of frame based modeling with well defined classes and properties – as opposed to the document structures implied by XML Schema.  With the help of Jessie Kennedy’s team at Napier and people around the world we started building an OWL ontology of the whole domain – then ran out of money.

We still needed basic terms for use in LSID RDF metadata. This lead to the  development of the LSID Vocabularies. These were very light weight “ontologies” but were still an attempt at defining terms using OWL.

In all our efforts there was a problem. There was no continuity of resourcing. For two years no one has been paid to manage the TDWG Ontology even though there is an increasing need for the disparate biodiveristy informatics projects to have a formal mechanism for defining shared terms. Because the resource is seen as common no one feels responsible to commit resources to manage it.

In the last few days I have been doing some work with Kehan Harman on establishing a technical fix for this.

Nomenclature is Dead! Long Live Barcode Taxa!

nigella-1Over the past few months I have been working on how to represent biological taxonomy and nomenclature using Description Logics. Here I combine these thoughts with a rather naive view of DNA Barcoding to suggest a new approach to taxonomy.

Description Logic (DL) is an extension of frame based languages (such as those used in object orientated programming paradigms) and semantic networks (e.g. WordNet) to link them to first-order predicate logic thus enabling the representation of application domains in formal, well understood ways that can be reasoned over by machines. DL has come to the fore in recent years with the advent of the Web Ontology Lanugage (OWL) by the World Wide Web Consortium (W3C). Two subsets of which, OWL-DL and OWL-Lite, are based on DL. Notably these two sub-languages guarantee decidability within finite time. From now on I’ll use the terminology of OWL-DL and OWL-Lite rather than generic DL terms. The OWL terms are more likely to be understood by a general reader who can read the OWL documentation as background. A concept in DL is referred to as a class in OWL. A role in DL is a property in OWL.

There are three principal features within OWL:

  • Classes are groups of individuals that belong together typically because they share some properties or property values.
  • Individuals are instances of classes.
  • Properties are statements of relationships between individuals or from individuals to data values.

There are other features within the language that allow the expression of things such as equivalence, cardinality and the domains and ranges of properties. Using OWL principally involves asserting specialization hierarchies of classes and inferring unknown subclass relationships and class membership using an inference engine such as Fact++. A set of OWL assertions is frequently referred to as an ontology.

My part in GBIF’s Role in Persistent Resolvable Identifiers

mermaidLast week I took part in a meeting at GBIF in Copenhagen to discuss the role GBIF could play in  Persistent Resolvable Identifiers (the technology formally known as GUIDs and often confused with UUIDs. Perhaps they should be called PRIs – pronounced ‘prize’ – just kidding.) This is the culmination of the LGTG (a.k.a. the Less Than Greater Than group). Thanks are due to Éamonn O Tauma and the team at the GBIF Secretariat for being wonderful hosts and to my fellow participants for being such good company.

This was a two and half day meeting that involved a group of us working on a document full of recommendations (to be published in the next month or so). As part of my contribution I came up with a slightly more detailed plan for how GBIF would interact with data suppliers and consumers. For a brief time this formed part of the final document but was then cut because it was too detailed. It may still make it back into the appendix but may also drop out completely so I thought I would present it here for posterity.

These are more or less just a series of notes and diagrams but they should be understandable to anyone involved in the field. I use the term GUID as this was before we changed to calling them persistent identifiers.

Note that what I present here is what I presented to the group and does not necessarily reflect the views of the group which will officially be published later.

KAP – Another photographic box ticked

kap-photo_1
Cwm yr Eglwys - Wind too strong.

You may not have heard of it but Kite Aerial Photography is quite a widespread hobby. It involves strapping a camera to a kite and flying it over something interesting. The camera can be fired remotely or just on a timer. Serious people build complex radio control rigs to move the camera around and point it in different directions.

Doing silly things with cameras appeals to me so, when I realized that my older compact digital camera (a Nikon Coolpix S1) had a feature to fire a shot every 30 seconds, I just had to give it a go. I built a rig using the Picavet suspension system. Bought a large kite for £30 and took it on holiday to West Wales. The result was terrifying!

Calling Time on Biological Nomenclature

Gathering Storm
Gathering Storm

I was writing a report on the role of nomenclators in PESI when I realized that (with a little tweaking and injection of dangerous opinions) one section would make a good blog post.

In order to facilitate the accurate exchange of taxonomic information, both within the taxonomic community and more widely in the biological and environmental sciences, the e-infrastructure needs to provide  two dictionary functions for scientific names of organisms i.e.

  1. A recognized list of the names used. To establish that any two studies are actually using the same names whilst accounting for spelling variants and homonyms as well as to facilitate consistency in spelling and presentation.
  2. A mapping between the names and descriptions of the taxa they are used for. To establish that any two studies are using the names in the same sense or compatible senses.

If the ICBN and ICZN codes required all names to be registered in a single or limited number of places then this would effectively fulfil the first function. Unfortunately neither the ICBN or ICZN codes require names to be registered. Neither do they require names to be published in a particular list of journals. They merely set out the conditions for effective publication. The  publications in which new names appear could be published anywhere and deposited in any library. There is no requirement for them to be peer reviewed.

GUID Persistence as Zen kōan

selbourne004_smaller

Most people are familiar with a few Zen kōans – the ‘nonsense’ sayings of the great Zen masters that are designed to make us think or rather not think. Their aim is to point more directly to what can’t be said in words. Examples include: “What is the sound of one hand clapping?” and “Does a dog have Buddha nature?”. Sitting silently and bearing a kōan in mind can be a powerful means of expanding our understanding. A kōan that would be useful for those of us involved in the discussions on Globally Unique Identifiers (GUIDs)  at the moment is: What is it that persists when a GUID is persistent? I have been dwelling on this for a while now and I’d like to share some of my thoughts.

A Position on LSIDs

Clover

I recently took part in a very long discussion on LSIDs on the TDWG-TAG mailing list. This seems to have been a perpetual discussion over the past four years. On reflection I realised that over two posts I had produced a kind of personal position paper on LSIDs and that it would be worth capturing the text in a blog post so it didn’t disappear into the mailing list archives. People often ask about LSIDs and it would be useful to have somewhere to point them to. Note that this text is off a technical discussion list and not newbie friendly. It assumes you know about LSIDs as a technology.

One issue that repeatedly comes up with LSIDs is that they may be more permanent than URIs. They offer a sociological advantage in that they are separate from ephemeral HTTP URLs that are used for everything on the web. The act of minting an LSID indicates that you intend to try to make it permanent or at least never re-use it for another resource.

The barrier to everyone hosting LSIDs is that they don’t all have access to DNS servers and can’t host the relevant SRV records. There are other barriers to do with binding LSIDs to particular institutional domains that may change. A solution to this may be to have a central service that hosts DNS records and it is implied that this would help with persistence but just hosting SRV records or supplying a redirect service does not actually provide any persistence at all to the data/metadata. Persistence of a GUID to 500 error rather than a not found is not helpful.

Identifiers, Identity and Me

selbourne001_smaller

The nice thing about blogging is that you get to mix-n-match your thoughts together in a way that you couldn’t do in the constituant parts of your life. This post brings together the notion of Globally Unique Identifiers (GUIDs) from my world of work and Buddhist notions of identity. It isn’t really acceptable to talk Buddhist spirituality in biodiversity informatics meetings and bringing up techie stuff when talking to Buddhist friends doesn’t help communication much either but here I can bravely attempt to mash the two together and I hope  shed light on both.

Buddhism is widely and erroneously believed to propose the notion of anatman meaning ‘no soul’. Atman figures big in Hinduism and in Abrahamic faiths as ‘soul’. Buddhism has a different spin on the soul and this is where the error often comes in. Generally different-from-having-something is considered to be not having it. Therefore it is concluded that there are no souls in Buddhism – but this is confused thinking.

“Do you have a soul?” is a loaded question. It assumes firstly that the world can be split into things, secondly that these things can have possessive type relationships and thirdly there are two things ‘you’ and ‘soul’ that may have this relationship. If you have a problem with any of these assumptions it is difficult to say anything in response to the question. Any notion of a self or even a thing is totally contingent on everything else in space time. Buddhism finds it difficult to locate ‘you’ and ‘soul’ and so impossible to express an opinion on their relationship.

This is exactly where we arrive at biodiversity informatics and the problems we have with GUIDs.

SpeciesIndex: A waste of midnight oil?

unicornBack last year at TDWG2008 in Fremantle there was a Wild Ideas session where people could propose crazy things that might not be serious or urgent. I gave a presentation called SpeciesIndex?: A practical alternative to fantasy mashups. This was meant to be a bit of fun but actually went down quiet well with a few people coming up to me afterward who were interest in it. A wiki page called SpeciesPages was created to flesh out the ideas.

The ideas presented in the paper to the conference and on the wiki are that each publisher of species pages. (i.e. anyone with a web site that has a page per species approach to taxonomy) should produce a SiteMap file that contains a list of just those pages and submits the location of the SiteMap to a register so that the pages could be indexed and other services built around them.

Over the intervening months I got to thinking about the idea some more  and playing around in the evenings with some code.