: Testing if models like RoBERTa or XLM-RoBERTa have "learned" the typological rules of specific languages during pre-training.
: Using the WALS database features as labels to see if a model's internal representations (embeddings) cluster according to known linguistic traits, such as whether a language uses definite articles. WALS roberta sets 37-70.zip
: Ordinal (53A) and distributive (54A) numerals, and numeral classifiers (55A). Nominal Syntax (Chapters 58–64) : : Testing if models like RoBERTa or XLM-RoBERTa
This specific set is often used in for the following purposes: Nominal Syntax (Chapters 58–64) : This specific set
: Noun phrase conjunction (63A) versus verbal conjunction (64A). Verbal Categories (Chapters 65–70) :
: Position of tense-aspect affixes (69A) and the morphological imperative (70A). Use Cases for the Dataset
: Leveraging the broad cross-linguistic data in WALS to improve how models handle the hundreds of languages that lack large amounts of training text.