Corpus of written narratives

Description

The corpus of written narratives is a corpus of texts produced by primary school children (second and third graders) obtained within PIPALE as an answer to a written composition item that integrates the diagnosis instrument developed by the project’s team.

The diagnosis instrument PIPALE contains four Booklets. Each Booklet assesses different domains of language awareness and written language skills. The item that assesses compositional writing is part of Booklet 4 and elicits the production of a short narrative text by means of a sequence of three images, as exemplified below. Each image may be associated with a key moment of a narrative sequence: initial situation, problematic situation, resolution.

There are several versions of the diagnosis instrument PIPALE. Therefore, the texts may have been elicited by different sequences of images. However, the instruction given to the children is always the same: “Observe the images. Write a story about the sequence of images. Give a title to your story”. After the instruction and the sequence of images, there are 10 lines available for writing the text.

In what concerns the methodological procedures, the written task is completed in class by the main class teacher, with the support of PIPALE research team. The task is applied at four different moments: beginning and end of second grade and beginning and end of third grade.

The corpus is permanently being updated, with the inclusion of new texts in each academic year. Currently, it comprises narratives written between 2019 and 2024, by students of different schools from the Municipalities of Sesimbra and Seixal.

Each text was anonymized, digitalized and transcribed, and it is preceded by metadata with information on the sociolinguistic profile of each child and on the context of data collection.

Written narratives corpus
Version of the diagnosis instrument PIPALE	Number of texts			Number of words
Version of the diagnosis instrument PIPALE	2nd grade	3rd grade	Total	2nd grade	3rd grade	Total
Instrument 1	158	–	158	5324	–	5324
Instrument 2	145	82	227	7170	6194	13364
Instrument 3	72	–	72	4883	–	4883
Instrument 4	80	84	164	3809	4705	8514
Instrument 5	68	73	141	5352	5622	10974
Instrument 6	202	87	289	7663	6676	14339
Instrument 7	131	64	195	9599	6098	15697
TOTAL	856	390	1246	43800	29295	73095

Access

The corpus will be available in .txt and .cha formats according to the norms of CHILDES (Child Language Data Exchange System) using the software CLAN (MacWhinney 2000). The texts in .cha format will include an orthographically standardized version of each text with syntactic annotation.