A typology of classifiers and gender: From description to computation

  • Datum:
  • Plats: Humanistiska teatern, Thunbergsvägen 3H, Uppsala
  • Doktorand: Tang, Marc
  • Om avhandlingen
  • Arrangör: Institutionen för lingvistik och filologi
  • Kontaktperson: Tang, Marc
  • Disputation


Categorization is one the most relevant tasks realized by humans during their life, as we consistently need to categorize the things and experience that we encounter. Such need is reflected in language via various mechanisms, the most prominent being nominal classification systems (e.g., grammatical gender such as the masculine/feminine distinction in French). Typological methods are used to investigate the underlying functions and structures of such systems, using a wide variety of cross-linguistic data to examine universality and variability. This analysis is itself a classification task, as languages are categorized and clustered according to their grammatical features. This thesis provides a cross-linguistic typological analysis of nominal classification systems and in parallel compares a number of quantitative methods that can be applied at different scales.

First, this thesis provides an analysis of nominal classification systems (i.e., gender and classifiers) via the description of three languages with respectively gender, classifiers, and both. While the analysis of the first two languages are more of a descriptive nature and aligns with findings in the existing literature, the third language provides novel insights to the typology of nominal classification systems by demonstrating how classifiers and gender may co-occur in one language in terms of distribution of functions. Second, the underlying logic of nominal classification systems is commonly considered difficult to investigate, e.g., is there a consistent logic behind gender assignment in language? is it possible to explain the distribution of classifier languages of the world while taking into account geographical and genealogical effects? This thesis addresses the lack of arbitrariness of nominal classification systems at three different scales: The distribution of classifiers at the worldwide level, the presence of gender within a language family, and gender assignment at the language-internal level. The methods of random forests, phylogenetics, and word embeddings with neural networks are selected since they are respectively applicable at three different scales of research questions (worldwide, family-internal, language-internal).