Archive this! UK Web Archive Mini Conference – November 4, 2019

I recently attended this event at the British Library (BL) in London, in which the small team of BL employees who work on the UK Web Archive (UKWA) spoke about their work to staff from the UK’s other legal deposit libraries. The UKWA is a legal deposit programme and much of the content it archives is only accessible from legal deposit library reading rooms. The purpose of the mini-conference was to better inform reading room staff at these institutions – which includes my employer, the Bodleian Libraries – in order that they can guide their readers to it as a resource for research.

The BL promoted the event with the following messages:

The UKWA gives researchers access to a new and largely untapped resource from which to conduct research. It will be demonstrated that the UKWA provides a unique window into all aspects of our lives, whether they be cultural, economic, political or social. The UKWA can also show the changing ways in which information is disseminated, and how websites have transformed over time by providing access to this important archive in perpetuity. Staff will learn how to contribute to the UKWA themselves to ensure websites are captured for posterity as well as be able to support researchers who are interested in adding to the UKWA.

The day was structured into presentations from guest researchers and BL staff working on UKWA, including its web archive engagement manager and curators. Professor Jane Winters (@jfwinters), professor of digital humanities at the School of Advanced Study, University of London, opened the conference with a welcome talk emphasising how much historians will need archives of digital information if we are not to move into a new dark age as relatively long-lived printed documents give way to more ephemeral digital ones. This gives a stark impression of the scale of the task.

Jane was followed by a brief introduction to the UKWA and its web interface by Jason Webber (@jasonmarkwebber) of the BL, whose job it is to promote the BL’s web archiving programme. There are two ways of accessing archived data: https://data.bl.uk/UKWA/ provides five datasets that are optimized for quantitative analysis; for a qualitative approach, it’s possible to search and browse the archive at https://www.webarchive.org.uk/.

Amongst Jason’s key messages are that they can only collect a “representative sample of the UK web space,” and that only major news websites are collected daily, with all other sites archived considerably less frequently. The ‘Topics and Themes’ displayed on the UKWA homepage are the easiest way into focused curated collections of material on specific topics. General searching is possible but, rather like Google, is likely to return far too many results for anyone to be able to sift through manually.

Next, Nicola Bingham (@NicolaJBingham) and Helena Byrne (@HBee2015), both web archive curators at the BL, talked about the wider team members working on the UKWA and the tasks they carry out to make it all work. The various legal deposit libraries curate collections on particular topics, but this is balanced against a generally undiscriminating approach to archiving public sites, meaning that they are not filtered for quality. Anyone can nominate any UK website to be added from a link on the UKWA homepage https://webarchive.org.uk/en/ukwa/info/nominate. It doesn’t have to have a UK domain or be hosted in the UK as long as it is demonstrably UK content.

During the lunch break there was an opportunity to submit a webpage for inclusion using the ‘nominate’ form.

Jason then introduced two PhD students who are making use of the UKWA as an information source in their research. Liam Markey (@Liam_Markey94) is based at the University of Liverpool and is researching the concept of militarism, particularly with respect to how the First World War is commemorated. He has found the UKWA to be a useful source of informally written and published material that nevertheless gives insights into how particular words are used in association with the remembrance of war. Following Liam, Hannah Connell (@HannahfConnell), of King’s College London, described how the UKWA has helped in her study of émigré publishing in Russian in the 20C. Of course, blogging, or otherwise writing on the web, is a form of publishing and, therefore, researchers like Hannah would be missing relevant material if they did not search the web, including looking for archived content that is no longer available.

After these case studies, Jason took the floor again to discuss the challenges and opportunities of web archiving. Are we really catching a representative sample? Personal sites such as WordPress blogs are underrepresented in the archive. And much online content is moving away from the open website format, eg, social media accounts behind logins and the general movement towards apps on mobile devices rather than browsers. These are all beyond reach. Even where content is captured, opportunities for ‘big data’ analysis are limited both by staffing levels and legal deposit restrictions (eg, access on a legal deposit library site only). Researchers wanting to cite archived web content also hit obstacles thanks to these legal deposit restrictions (ie, they cannot provide a valid link to content that is only accessible in a handful of reading rooms). Overall, very few people are working on web archiving globally, particularly with technical skills.

“We’re not quite history yet,” Webber states, referring to the still relatively recent date of the earliest archived web content. As Professor Winters suggested in her introduction, interest in and use of web archiving is expected to rise. “There is a great future ahead of us.”

To wrap up the day, the BL’s Dr Richard Price thanked all contributors and put the importance of the project into context for society as a whole: “Public witness is fundamental.” We will always need to know who said what and when in the public sphere.

To keep up-to-date follow the @UKWebArchive blog here: https://blogs.bl.uk/webarchive/?_ga=2.255180104.1570709599.1572879087-245832814.1572879087