In 2021, almost all information is produced, disseminated and consumed in digital format. This is the information that governs every second of our societies. However, when we look at the various existing digital preservation initiatives, they focus on digitizing collections that were not originally digital, for example, digitizing printed books. Most digital information published online is lost.

Recent events have reinforced this trend. About a year and a half after the start of the Covid-19 pandemic, we see how this event generated an immense amount of information (and misinformation) that was produced, replicated, adulterated, disseminated, and which, whether we agree with it or not, influenced the taking of social, political and economic measures.

It should be noted that many of the large torrents of information that currently influence political decision-making processes in democracies are informal, being conveyed through online channels such as social networks. This information appears and influences as quickly as it disappears and becomes inaccessible. (except for the multinational companies that own the platforms, which are aware of the value of preserving this information). Without the memory of the initial information (and misinformation) that proliferated online about SARS-CoV-2, what lessons can governments and citizens learn? What history can be written about other contemporary events without the memory of the online?

There is official digital information that is carefully preserved. Examples include publications in the Electronic Official Gazette or the Authentic Digital Objects preserved in the RODA of the Directorate-General for Books, Archives and Libraries. However, these official communications document the effect of events and are unlikely to be sufficient, on their own, to analyze the causes of a phenomenon., learning lessons to better react to similar situations in the future.

Online files vs. archives of online

What is meant by an "online archive"? Examples like the ones above are excellent "online archives" that can continue to evolve through the conventional adaptation of legislation and technology. After all, the importance of preserving the Decrees of Law of a Republic is undeniable. My concern is mainly related to the “archives of online”, since there is not yet an established awareness of its need, whether at an academic, governmental or individual level.

That "Information is power" is an accepted truth. Modern organizations communicate strategically, sharing information through their online channels such as websites or social media. But how many organizations are aware of the value of preserving their information online? How many are aware of the risk of losing that information? How many professors in various scientific fields alert their students to the importance of preserving information online or the impacts of losing it? If information is power, then losing information is losing power.

It's technologically impossible to preserve all online information. But it's absurd not to be aware that we need to preserve some online information for short-, medium-, and long-term access (and consequently act accordingly). After the arrival of the Information Age, which solved the problem of access to information, archives must contribute to combating the current Age of Disinformation. The role of archives of Online communication is crucial in this fight because analyzing information from multiple sources over time helps identify inconsistencies or assign credibility. The greater the volume of information, the more possibilities there are for assessing the veracity of a piece of information.

The advantage of files of online is that information, once it is born digital and is quickly available, can be processed automatically and in multiple ways. But it is necessary to create a new type of institutions to carry out the archive of online because it is a task with very specific challenges that require specialists and adequate resources.

The cost of not preserving digital information born online will be enormous for future generations because it will be impossible for them to learn from past mistakes. In this sense, the main challenge of archives of online is being able to make the world feel that they are needed today.

File of online: difficult but not impossible

Technically, most of the content we consume online is served via the HTTP (or HTTPS) protocol, meaning it is content from Web. However, about 80% of the content available on the Web is changed or disappears after just one year.

The Internet Archive is a US non-profit organization that archives web content worldwide. However, it is difficult for a single organization to create an exhaustive archive of all published content because the Web is constantly changing and much information disappears before it can be archived.

arquivos online FCCN, Serviços digitais da FCT — image: Internet Archive

Furthermore, documenting historical events of national significance to a given country is not a priority for the Internet Archive, and much of the information published, for example, on the Portuguese web is irretrievably lost. This problem is also felt by other national communities, so There are already at least 93 web archiving initiatives around the world.

In Portugal, the Arquivo.pt is an example of a file of online that allows you to search and access archived web pages since 1996. This is a public service managed by the Foundation for Science and Technology and accessible to any citizen. Arquivo.pt stands out for offering a search service for pages and images from the past. It's a kind of Google, but for the past of the web.

The system that supports Arquivo.pt periodically collects and stores information published on the web. It then processes this information to make it searchable and accessible. This preservation process is performed automatically through a large-scale distributed computer system. The search and access service can be used automatically through Application Programming Interfaces (APIs) to develop innovative applications that take advantage of archived information.

Arquivo.pt provides a free preservation service to web authors and at the same time a valuable research resource that has already been used by researchers, for example, to automatically measure the accessibility of the Portuguese web for people with disabilities. The Arquivo.pt Award annually recognizes works that use information preserved by Arquivo.pt. The ten award-winning works to date are real examples of the social and scientific potential of archives of online is huge and has only just begun to be explored.

Arquivo.pt holds more than 10 billion archived files (700 TB). However, the biggest challenge isn't disk space to store this information. The challenge is keeping this information searchable and accessible in a timely manner, which means, these days, providing users with answers within seconds and suitable for any device. The second challenge is recruiting and training specialized human resources. How to archive online data isn't yet taught in universities, so ongoing training for new team members is required.

The third, and most unexpected challenge for me, is the difficulty in spreading the word about the service's existence. I hope I've been able to argue so far that the files of Online resources are essential. Arquivo.pt has been publicly available since 2010. How long have you known about it? We live in an attention economy. Human attention has become a scarce commodity, for which the world's most powerful companies compete fiercely with each other using nearly unlimited resources and ethically questionable strategies. In the online world, which is Arquivo.pt's home, this will be the major short-term challenge: manage to capture attention so that this public service can be useful to more people.

Other related articles

Política sobre Acesso Aberto pela voz da comunidade científica

Para melhor compreender o alcance da Nova Política sobre Acesso Aberto a Publicações Científicas, foram reunidos testemunhos de elementos da comunidade de investigação nacional.

Read article

João Mendes Moreira: “Portugal as a leader in promoting Open Science”

João Mendes Moreira explores the main objectives and how this Policy will impact the academic and scientific community, but also society.

Read article

Madalena Carvalho: “Educast is a relevant tool to enrich the learning experience”

Madalena Carvalho, director of Documentation Services at the Open University (UAb) spoke about this service.

Read article

1 million aggregated content items: “a historic moment for RCAAP and open science in Portugal”

RCAAP ends 2024 on a high note, achieving a historic milestone. This goal comes in the year the service celebrates its 16th anniversary.

Read article

Da investigação à sala de aula: projetos da FCCN potenciados pela Inteligência Artificial

A IA está a transformar a forma como se desenvolve a investigação científica e como o conhecimento produzido chega à sociedade, incluindo às salas de aula.

Read the news

Rede Nacional de Gestão de Dados de Investigação consolida ecossistema de dados abertos em Portugal

O consórcio Re.Data realizou a 4.ª Assembleia Geral na Universidade do Minho, encerrando o projeto que criou 13 Centros de Gestão de Dados e estabeleceu o Re.Data como infraestrutura nacional de suporte a dados de investigação.

Read the news

Como a Inteligência Artificial pode apoiar docentes no dia a dia

A IA consolida-se como aliada estratégica no ensino superior, fornecendo ferramentas que impulsionam a inovação metodológica e aprimoram a experiência de aprendizagem.

Read the news

Política sobre Acesso Aberto a Publicações Científicas assinala um ano

Assinala-se um ano desde a entrada em vigor da Política sobre Acesso Aberto a Publicações...

Read the news

CRIS2026: inscrições individuais já estão abertas

A 17.ª edição da conferência decorre de 19 a 22 de maio, no campus de Ponta Delgada da Universidade dos Açores.

More information

17.ª Conferência Lusófona de Ciência Aberta regressa a Portugal em 2026

A Universidade do Algarve acolhe o principal fórum de reflexão e partilha sobre Ciência Aberta no espaço lusófono.

More information

Arquivo.pt promotes a session dedicated to Digital Preservation

On November 6th, Arquivo.pt will organize the webinar “Annotating search results on Arquivo.pt”.

More information

FCT's contribution to the evolution of Open Science to be debated on October 21st

The session “Open Science in Evolution: the contribution of the FCT” will take stock of the initiatives developed to date.

More information