Yesterday (30-11-2017) the first International Digital Preservation Day (#IDPD17) took place. This initiative of the British Digital Preservation Coalition (DPC), and born at a session of IPRES 2016, was held in places all over the globe. In that sense, it was a truly international affair and in its infancy already quite a success.
One of the Dutch events of the IDPD was held on the top floor of the building the Ministry of Education, Culture and Science (nice views!) and was organized by the NCDD (Dutch Coalition of Digital Preservation), NDE Houdbaar (Dutch Digital Heritage – the preservation department) and chaired by the able MC Marcel Ras. The IISH, in the person of yours truly, gave two talks on this day, which were, not surprisingly, in line with the themes of the day: email archiving and significant properties (of digital objects). One of the things that struck me beforehand was the number of visitors: more than 80 people. This for me is a sign that the Dutch digital preservation community is really growing – if we would have organized a meeting with the same themes 5 years ago we would have been lucky to attract 20 people.
Being a speaker, I didn’t have the energy, nor the time to take notes of all the speeches. And as I am sure more thorough accounts will follow, I hereby share my highly subjective impressions and remarks with you.
Email archiving is – to put it positively – a challenge. Or, being honest, a headache. The intimidating amounts of files and data, the complex privacy issues, the concept of an email box as personal information, all the different file attachments, the diverse (and sometimes obsolete) email formats and different email clients are, to name of few, all rather overwhelming. Most of the talks came from the perspective of government email archiving. James Lappin, a researcher of Loughborough University, an experienced records manager and blogger on email archiving, pointed to the dilemma of email archiving of government: can you realistically expect civil servants to archive email within a records management system or is this is simply and ultimately an illusion? Marens Engelhard, director of the Dutch National Archives, and later Robbert Jan Hageman (National Archives) and Jesper Harmes (Ministry of the Interior and Kingdom Relations) gave a fascinating view on a rather radical alternative approach to this dilemma: harvest all email of all civil servants of Dutch government – approximately 1 billion a year – and trust on smart algorithms/artificial intelligence/big data analysis to make sense of this (more or less) unstructured data. The idea is that all these mails will be retained for 10 years (do the math: 10 billion emails) and only the email accounts of so called capstone functionaries will be kept perpetually/for an indefinite time. Civil servants will be given 10 weeks to filter out personal mails. Mette van Essen (National Archives) gave a fascinating demo how a big data filtering system could work (separating "noise" from "functional information"). This work under the banner of “National Archive Machine Learning Experiment”) is a joint effort between the National Archives and ICTU and is shared as open data. As this is a radical new approach, with all kinds of legal bears on the road (sorry), I guess it will take a long time/forever for this to be realized.
Email archiving @ the IISH
I was asked to give an alternative perspective on the challenge of email archiving as the IISH is a private archive. But of course, some of these challenges are the same (apart from the scale). Especially the technical issues are more or less equal, as are the privacy obstacles (even if scientific research is more or less excepted from the new more stringent EU privacy laws). Other than in the case of governmental records our email archives will come from very diverse backgrounds (from highly professional well-ordered organizations to messy personal archives) and in many shapes and forms. Also, a big difference is that the individual agreements with the archival donors determine the access to the (email) archives and not archival law. See more in the (Dutch) slides of my talk underneath this blog.
Significant properties or the “DNA of an object”
In the digital preservation community the concept of significant properties is kind of a tricky subject. As Remco van Veenendaal (National Archives) showed us in his talk even the definition of what is exactly meant by this concept, any consensus is hard to find. But ok, let’s take the plunge into shark ridden semantic waters and use the title of this IDPD session the DNA of an object and follow that analogy and state that the idea of significant properties is that a digital object has indeed a core or an essence. The idea is that this core can be derived from an organization’s mission and (archival) policies. A good example of this was in the talk of Annemieke de Jong of the Institute of Sound & Vision. This institute’s main collection are the AV collections from Dutch broadcasting organisations. As they have been preserving their AV collections since the 1960’s, and as preserving AV collections is all about migrating to new carriers and formats, the idea of a set of significant properties that have to be preserved during migration comes natural. The cases of the Dutch National Library/KB (talk of Jeffrey van der Hoeven) and National Archives (Remco) gave, from my perspective, a slightly less clear and convincing picture of what significant properties meant for their respective organisations.
Your significant properties are not mine
I again had/took the role of offering an alternative, slightly provocative view. As I began doubting the use of the concept of significant properties for the IISH for some time now – mainly due to Trevor Owens recent book (The Art and Craft of Digital Preservation) and discussions with Johan van der Knijff from the KB – I chose to share my doubts in my talk. To put it shortly: I think significant properties are so dependent on the context of how objects have been created, kept and accessed that in our case will be very hard to pin down. As to quote myself (sorry again): “your significant properties are not mine”. See more in the (Dutch) slides of my talk underneath this blog.
Bit List of Digitally Endangered Species
An almost festive part of the IDPD, if only the motive wasn’t so sad & serious, was the launch of DPC’s Bit List (with, I must say, the slightly preposterous subtitle) of Digitally Endangered Species. DPC’s director William Kilbride introduced the & launched the list in an (almost) live session from Glasgow. The list is created by an international jury and is ranked from low risk until practically instinct. So, it seems like I wasn’t suffering from conference fatigue when tweeting about a digital dodo at PASIG 2017.
Digital preservation should just work
The day was closed by an excellent keynote from Preservica’s director Jon Tilbury with his view on significant properties and the more general observation that “digital preservation should just work” for users who just don’t care how it exactly works. Or put otherwise: it should be like a modern car where have the driver just has to drive and let the only highly schooled mechanic peer under the bonnet. Now, as much as agree with, and like this analogy, who then should be mechanic is, I think, still an open question. In my opinion, this should ideally be digital preservation specialists (or – Trevor Owens - artisans!) and developers from cultural heritage/research institutes in combination with the specialist from the digital preservation companies like Preservica, Artefactual (Archimatica) and Rosetta.
In short, it was quite a day! I personally, am already looking forward to IDPD 2018. Thanks for all those involved!