Find out the opinion of António Branco, Professor at the Faculty of Science of Lisbon and General Director of PORTULAN CLARIN Research Infrastructure for Language Science and Technology, about the Albertina PT-* project.

The advances in Artificial Intelligence have been impressive, especially in its application to Language Technology. This progress is based on machine learning with so-called Big Language Models, such as GPT-3 or ChatGPT, which have been talked about so much recently.

These networks are gigantic - GPT-3, for example, has 175 billion connections between neurons. They pick up linguistic regularities when trained in massive computational processes, on colossal volumes of linguistic data, text or audio. In the case of GPT-3, 500 billion words were used for training.

Once trained, these models can be used in other language tasks at an unprecedented level of quality, such as translation, conversation, speech transcription and subtitling, text and speech generation, content analysis and information extraction, etc. When integrated into wider systems, they are transforming diagnostics and healthcare, financial and legal services, gaming and entertainment, education, creativity and culture, etc.

Due to the size of the models, these processing tasks are available remotely as online services, such as search engines, rather than as locally installed spell-checkers on our devices. Due to the size of the resources for learning, these services are immediately available from the oligopoly of bigtechs, which can be counted on the fingers of one hand, with the ability to access the colossal amounts of computing and data needed for training.

As a result, in the digital age, language use - with other humans, organizations, services or artificial devices - will never be done again without this pervasive and deep technological intermediation, which processes acts of communication and accesses their meaning.

We have enough experience with information search engines, for example, and their assumptions and impacts, to intuit the consequences of this technological intermediation on the everyday use of language itself. Technological intermediation, in general, generates a digital trail of personal data beyond our control. Incessant technological intermediation of human language and communication in particular, funneled into a small global oligopoly, creates alarming risks for individual and collective sovereignties.

Undesirable impacts of emerging technologies are mitigated with more and better technology, not less. Dispersing the supply of these services is crucial to counter the threat posed by their concentration. The answer thus lies in fostering an innovation ecosystem that alternatively enables timely and widespread access to the resources needed for the appropriation and exploitation of Language Technology by as many individuals and organizations as possible, private and public, small and large, national and international.

In this respect, the RNCA is already playing a major role, particularly through Computing Projects Calls Advanced: Artificial Intelligence in the Cloud.

I coordinate one of the projects funded by the first edition of this competition, in which we seek to contribute to open AI and the technological preparation of the Portuguese language. One of the results of this project, which I'm reporting on here, is Albertina PT-*. This is a foundational model developed specifically for the Portuguese language, both for the European variant spoken in Portugal and for the American variant spoken in Brazil.

As far as we know, with its 900 million parameters and its performance level, it constitutes the current state of the art regarding large foundational language models of the encoder class for this language that are publicly available in open source, free of charge and with unrestricted license. A comprehensive presentation of its features and implementation can be found in the paper accepted for publication in the proceedings of EPIA2023, the annual conference of the Portuguese Association for Artificial Intelligence.

This is just a first step towards the democratization of this technology, which is key for the future, and in the promotion of open generative AI, to which RNCA, I am sure, will continue to make an invaluable contribution.

___

Announcement on the Albertina PT-*

Other related articles

João Nuno Ferreira: "With Deucalion there has been a big leap forward in HPC capacity in Portugal"

The general coordinator of the FCCN spoke about advanced computing and the investment made at national level.

Read article

Deucalion: a milestone in Advanced Computing in Portugal

Professor Rui Oliveira, Director of the Operational Center where Deucalion is installed, presented the best Portuguese supercomputer managed by FCCN.

Read article

"What struck me most was meeting and dialoguing with colleagues from the most diverse areas"

Pedro Fernandes, from the Faculty of Sciences of the University of Porto, was one of the participants in the 2023 Advanced Computing Meeting. He shared his testimony about the initiative.

Read article

Advanced Computing Meeting: Margarida Prozil's experience

In 2023, Margarida Prozil, Head of Data at Data CoLAB, took part in the National Meeting on Advanced Computing for the first time. Here's her testimony.

Read article

Serviços de computação avançada da FCT permitem a investigadores desafiar leis clássicas da física estatística

Luís Oliveira e Silva, Thales Silva e Pablo Bilbao descobriram propriedades em plasmas, que viajam quase à velocidade da luz quando expostos a campos magnéticos ultra-intensos.

Read the news

Supercomputing within reach of startups and SMEs: FCT Launches InovIA Innovation Vouchers

FCT launches a new advanced computing program that aims to democratize access to state-of-the-art supercomputers.

Read the news

Deucalion attends EuroHPC Summit 2025

The event featured two presentations on the capabilities of the Portuguese supercomputer.

Read the news

Open call for membership of the National Center for Advanced Computing

Opening of the expression of interest to all entities wishing to join this network. as a member.

Read the news

5.º Encontro de Computação Avançada vai até Aveiro em outubro

É nos dias 22 e 23 de outubro que a comunidade desta área se reúne no edifício da Reitoria da Universidade de Aveiro.

More info

EuroHPC User Day 2024 opens Registration

Registration is now open for EuroHPC User Day 2024, which will take place on October 22 and 23, 2024, at the iconic Eye Museum in Amsterdam, the Netherlands.

More info

FCT co-organizes an Open Knowledge Day on May 22

"The role of supercomputers in the digital transition and in the competitiveness of organizations and companies" is the motto of the initiative, which will take place on May 22, starting at 2pm, at the University of Minho.

More info

Inauguration of the MareNostrum 5 supercomputer: Barcelona is Europe's new innovation hub

The opening of MareNostrum 5 reinforces Europe's role at the forefront of global technological innovation. Barcelona is Europe's new innovation hub

More info