HOBS - syntactic layer

Croatian Dependency Treebank is a corpus of approx. 4.500 sentences tagged according to the modified Prague Dependency Treebank specification for annotation at analytical level. SynSem visualizer enables the queries across 3.500 sentences annotated with semantic roles.

HOBS - semantic layer

Semantic layer of Croatian Dependency Treebank consists of 3.500 sentences labeled with semantic roles according to the SRL specification for Croatian.

HR4EU corpus

HR4EU corpus consists of approx. 500 sentences from the Croatian language courses available at web portal HR4EU, which is developed at the Institute of Linguistics and financially supported by European Union, European Social Fund. Sentences are annotated on both syntactic and semantic level according to the model used for HOBS.

Resources

Two corpora tagged on the morphosyntactic, dependency and semantic role level are available on this site.

First corpus, Croatian Dependency Treebank, is developed at the Institute of Linguistics (Faculty of Humanities and Social Sciences, University of Zagreb) as the part of the project "Development of Croatian Language Resources" supported by the Ministry of Science, Education and Sports of the Republic of Croatia. It is a part of the Croatian National Corpus, i.e. a part of newspaper subcorpus (weekly newspaper Croatia Weekly, CW2000). Subcorpus CW2000 is lemmatized and morphosyntactically tagged in accordance with MulTextEast recommendations for Croatian language (using Croatian Lemmatization Server) and manually disambiguated. The next steps consisted of manual annotation at analytical level according to the modified specification used for Prague Dependency Treebank and the semantic role labeling according to the specification for Croatian developed at the Institute of Linguistics.

Second corpus, tagged on all above mentioned levels, consists of approx. 500 sentences from the Croatian language courses available at web portal HR4EU, which is developed at the Institute of Linguistics and financially supported by European Union, European Social Fund.

Tools

SynSem visualizer enables the queries of Croatian Dependency Treebank on syntactic and semantic level. It additionally enables the queries according to word-form, lemma and morphosyntactic tag.

SynSem visualizer is developed under the project HR.3.2.01-0037 "Mrežni portal za online učenje hrvatskoga jezika" (HR4EU), financially supported by European Union, European Social Fund (OP Human Resources Development).

Investing in future

Specifications

Morphosyntactic specification according to the MulTextEast recommendations for Croatian language is available here.

Syntactic specification is available here.

Specification for semantic role labeling is available here.

Institutions

Institute of Linguistics
Faculty of Philosophy
University of Zagreb
Ivana Lučića 3
10000 Zagreb
Croatia
tel. +385 1 6120-142, 6120-063
fax. +385 1 6156-879
e-mail: zzl@ffzg.hr
web: http://www.ffzg.hr/zzl

Staff

Researchers