Vellum

Following Vint Cerf’s talk at AAAS, the “Digital Dark Age” is in the news again (see DSHR’s blog for a good summary, or one of the ~200 other news articles about it!). The coverage spun me into a Twitter rant (documented here), but after reflecting on my reaction, I feel it’s worth exploring the issues in a bit more detail…

Honestly, if Cerf brings the news that our digital crap is in jeopardy, that’s good enough. Different people listen to him.

@textfiles

Yes, I agree that it’s great that big news stories like this raise awareness about these issues. But that’s just the first step. Awareness is only an engine of anxiety unless you know how to act upon that knowledge, and it’s this part of Vint’s message that I find problematic. By pitching “Digital Vellum” in almost the same breath as posing the problem, Vint makes it sound like digital preservation is just another technical issue to be solved. Hey! Don’t worry, it’s just a bug in the system! The boffins are working on it! It’ll be fine!

Well, the boffins have been working on this for a while:

There’s the late 90’s Universal Preservation Format…
…the early 2000’s Universal Virtual Computer…
…the mid-2000’s XCL Project (which doesn’t think of itself as a universal preservation format, but it is)…
…the late 2000’s Self-contained Information Retention Format…
…the early 2010’s KEEP Virtual Machine (see also here, here and the patent, here)…
…and of course the currently available bwFLA Emulation as a Service system…
…and now Vint’s weapon of choice, Olive Archives

My initial response to the news story was frustration that so much of this work appears to be unknown to Vint¹. But having thought about it, this is really a symptom of a much deeper problem. We’ve been researching this for nearly twenty years, with some very clever people coming up with some very clever ideas, but these ideas remain lost in a niche. We’ve been using ‘Digital Dark Age’ to scare up funding for a long time, but for all the academic papers they produced, those big, expensive EU projects have led to remarkably few concrete changes in how content is managed over time.

Clever people with fancy technical solutions are not enough. Yes, digital media are more brittle than most analogue media, and yes, changes to software can make individual resources inaccessible. There are a range of solutions and approaches at our disposal, but to take advantage of them, there must be individuals and organizations who have access to the skills and technical resources they need in order to fully realize their role as custodians of that material².

This is why digital preservation is both a social and technological problem, in equal measure. Or, to put that another way…

“Digital Vellum” is made of people. @anjacks0n

It appears that this aspect of the preservation problem did come up in the Q&A after Vint’s AAAS presentation³:

Google is not directly involved in the digital preservation effort, Cerf said, “although we have worked really hard at preserving the digital information of the day. We aren’t planning to become the archive of the future – although I think it would be cool.”

Instead, he envisions libraries and governments investing in the technology needed to carry today’s information into the distant future.

“Internet future blackout: No way to preserve our data”, Contra Costa Times

Ah, so it’s down to us. Of course.

And this, I think, illustrates one of the main underlying problems that have prevented us from getting those ‘universal preservation formats’ off the ground. Governments and libraries have the remit to drive investment in digital preservation, but the broader market is not interested in solving our problems. The historical record is a niche concern, and it’s not a lucrative niche.

Furthermore, we are all competing in the same market as the big players when it comes to staff and skills, and in the current climate of dwindling budgets, building up real and sustainable expertise in these areas is extremely difficult. Collaborating on open source tools provides one way of pooling our resources, but while some progress has been made, it’s still rare to find enough technical expertise to make these projects sustainable.

I guess the next step, then, is to try to find ways of using the recent publicity to drive appropriate investment. Not in academic research, or closed solutions riddled with opportunities for vendor lock-in, but in transferring the existing research-quality tools into sustainable technologies. I’m not sure what the answer is, but I suspect that there is a skills profile waiting to be invented somewhere between archivist, librarian and programmer, and we need more people who can work along that boundary in order to help us find a way forward.

Anj

EDIT: I slightly modified the final paragraph based on feedback from @CassPF, who pointed out we need archivist/coder skills too.

Particularly the KEEP and bwFLA work which is really very similar to the Olive Archive approach – maybe he should have Googled it ;-) ↩︎
Out of all the recent news coverage, I think Seamus Ross’s interview on BBC Radio 4 (from about 00:30:30, here) was the most helpful. Seamus emphasized that you need to take care of your digital resources, and outlined some of the actions you can take to look after the files you care about. ↩︎
Assuming the quote is accurate – his response to what is presumably the same question comes across a bit differently here. ↩︎