I recently read a moving post on memory by Dawn Foster that set me thinking.

Dawn has epilepsy which means that 20-40 times each day she misses a few seconds of what is going on – yet nobody notices.  She finds it disturbing especially when compared to the experiences she has had with her grandmother suffering from Alzheimer’s. The thing that struck me was the phrase “I worry constantly about how much we remember.” This made me think of the suffering that even the thought that one might not remember something can create. Continue reading »

My pamphlet “A Brief Introduction to Mindfulness Meditation” is now available on the Kindle store worldwide. There has to be a nominal charge for it but it does make it available to a wide audience. Visit the UK or US Amazon stores to purchase. In other territories you will have to search for it I am afraid – but if you can buy Kindle books where  you are you should be able to buy this one.

The pamphlet is released under a Creative Commons license and you can download it as a PDF, a PDF designed for booklet printing and as an eBook in ePub format from the downloads page.

Today I launch the first version of my “A Brief Introduction To Mindfulness Meditation” pamphlet with a massive print run of fourteen! Plus availability on-line and the possibility of printing as many as needed of course.

Download a digital copy to read and freely distribute from the
downloads page.

Near the end of 2010 I became frustrated with what I thought of as the “barriers to entry” for people who were interested in developing a meditation practice. It was as if the very bottom rung of an otherwise excellent learning ladder was missing. Continue reading »

Whilst I have been working on digitizing the Rhododendron monographs I have also been providing some technical help for Stuart Lindsay who is producing a series of fact sheets for the Ferns of Thailand. This has helped crystallize my thoughts regarding monographs and how we migrate them into the digital age.

This post is a follow on from a previous one where I discuss mapping the Rhododendron monographs to EoL. It is an opinionated rant but I offer it in the hope that it will be of some use.

When monographs/floras/faunas are mentioned in the context of digitization people usually chirp up with PDF or, if they are more clued up on biodiversity informatics,  TaXMLit and INOTAXA (Hi to Anna if you are reading) or TaxonX and Plazi.org (Hi to Donat).  The point I am going to make in this blog post is not against these ways of marking up taxonomic literature but more the nature of the monographic/floristic/faunistic taxonomic product itself. I am far more familiar with the botanical side of things so apologies to zoologists in advance. Continue reading »

I’ve had my head down work wise for the past few weeks trying to get the Rhododendron monograph markup finished. I now have a little database with some 821 species accounts in it plus a few hundred images – mainly of herbarium specimens. The workflow has been quiet simple but very time consuming.

  1. Text is obtained from the source monograph either via OCR or access to the original word processor documents.
  2. The text is topped-and-tailed to remove the introduction and any appendices and indexes.
  3. Text is converted to UTF-8 if it isn’t already.
  4. An XML header and foot are put in place and any non-XML characters are escaped  – this actually came down to just replacing & with &
  5. The text is now in a well formed XML document.
  6. A series of custom regular expression based replacements are carried out to put XML tags at the start of each of the recognizable ‘fields’ in the species accounts. These have to be find tuned to the document as the styles of the monographs are subtly different. Even the monographs published in the same journal had some differences. It is not possible to identify the start and end of each document element automatically. This is for three reasons:
    1. OCR errors mean the punctuation, some letters and line breaks are inconsistent.
    2. Original documents have typos in them. A classic is a period appearing inside or outside or inside and outside a closing parenthesis.
    3. There are no consistent markers in the source documents structure for some fields. For example the final sentence of the description may  contain a description of the habitat, frequency and altitude but the order and style may vary presumably to make the text more pleasant to read. The only way to resolve this is by human intervention.
  7. The text is no longer in a well formed XML document!
  8. The text is manually edited whilst consulting the published hard copy to insert missing XML tags and correct really obvious OCR errors. In some places actual editing of the text is needed to get it to fit a uniform document structure as in the habitat example above.
  9. The text is now back to being a well formed XML document.
  10. An XSL transformation is carried out on the XML to turn it into ‘clean’ species accounts and alter the structure slightly.
  11. An XSL transformation is carried out to convert the clean species accounts into SQL insert statements for a simple MySQL database. The structure of this database is very like an RDF triple store (actually a quad store as there is a column for source). A canonical, simplified taxon name (without authority or rank) is used as the equivalent of the URI to identify each ‘object’ in the database. Putting the data in a database makes it much easier to clean up and to extract some additional data. An alternative would be to have a single large XML document and write XPath queries. Continue reading »