Università degli Studi di Pavia

Dipartimento di Studi Umanistici

HomeDipartimentoSezioniLinguistica Teorica e Applicata › Pavia Corpus of Film Dialogue

Pavia Corpus of Film Dialogue

The Pavia Corpus of Film Dialogue (PCFD) is a parallel and comparable corpus made up of  original Italian films and original English films together with their dubbed Italian translations. The corpus was created at University of Pavia where it has been developed since 2005 to investigate translated and non-translated audiovisual dialogues on their own, in parallel and contrastively.
The PCFD has been conceived as a flexible tool for analysing and comparing film language and audiovisual translation, with a focus on the English-Italian language pair. The corpus allows the pursuit of several objectives. Moving from a target-language orientation to dubbing, a systematic study can be carried out of linguistic, sociolinguistic, pragmatic and translational phenomena to ultimately delineate a profile of contemporary dubbed Italian. The component of the Anglophone original dialogues can also be inspected independently of its dubbed counterpart to look for conversational features and uncover their specific functions. The comparable component, comprising original Italian productions, makes it possible to draw comparisons between dubbed and original Italian and between original English films and original Italian films. Finally, the corpus can be exploited for language learning purposes.
The PCFD presently includes a unidirectional parallel component made up of 24 American and British film dialogues and their dubbed Italian translations. The total number of words runs up to about 500,000 words in the parallel corpus. This is the second version of the corpus, which initially contained 12 films. A comparable component comprising 24 original Italian films, for a total of approximately 220,000 words, has been subsequently added to the parallel component.
All films have been transcribed manually from the film soundtrack, using the dialogue turn as the unit of transcription and alignment. For ease of reading and computer search, the second version of the corpus contains only orthographic transcriptions. The PCFD has been converted into a relational database which, through turn-by-turn alignment, allows for more thorough translational and cross-linguistic analyses of individual items and discourse sequences. The database also permits users to carry out queries beginning from either the original or the translated components of the corpus, a procedure that grants easier access to translation operations and instantiated cross-linguistic correspondences.
The database includes several parameters: textual and contextual variables, i.e., character speaking, scene type and linguistic event (e.g phone calls), together with individual variables including accents, accompanying paralinguistic behaviour (e.g. whispering), and salient non-linguistic behaviour (e.g. waving). The corpus also houses metadata such as year of production, screenwriter, and translator-dialogue writers, all relevant to the study of characterisation, individual stylistic variation and short-term diachrony.
The films in the corpus were chosen to be representative of ‘conversational’ audiovisual products, both translated and non-translated, i.e., only films that were likely to stage naturalistic interactions were chosen for inclusion in the corpus.
The films whose dialogues were to compose the corpus had to:
The construction of the PCFD is the result of teamwork directed and co-ordinated by Maria Pavesi, the principle investigator of the project as well as the ideator of the corpus.  
Several researchers contributed to different stages and tasks in the development of the PCFD: Maria Freddi (co-direction of version 1 and 2 of PCFD and relational database); Francesco Lunghi (engineering support in the creation of the database); Silvia Monti, Elisa Ghia, Maicol Formentelli, Silvia Bruti, Veronica Bonsignori, Elisa Perego and Valentina Coletto (transcriptions and inputting of the dialogues and requests for copyright clearance); Raffaele Zago (development of the comparable component).
The PCFD was conceived and developed within two research projects. The nationally relevant project “Ecolingua: E-Corpora in Linguistic and Multimodal Studies, in Translation, and in On-Line Language Learning and Testing” was funded by the Italian Ministry of Education, University and Research (2005), and coordinated nationally by Christopher Taylor (University of Trieste) and locally by Maria Pavesi (University of Pavia). The international project “English and Italian Audiovisual Language: Translation and Language Learning” was funded by the Alma Mater Ticinensis Foundation (2010-2012) and directed by Maria Pavesi (University of Pavia).
Main publications


Dipartimento di Studi Umanistici

Università degli Studi di Pavia
Segreteria amministrativa: Piazza Botta, 6 - 27100 Pavia
Segreteria didattica: Corso Strada Nuova, 65 - 27100 Pavia
Sezioni del Dipartimento
Email: webmaster.lettere (at) unipv.it