Annotation

In PropBank, we identify the arguments of predicates (e.g. verbs, eventive nouns) and label them with semantic roles that show their relationship to the predicate. The semantic arguments of the verb are labeled on a verb-by-verb basis, creating a separate frame file that includes verb specific semantic roles to account for each subcategorization frame of the verb. It has been shown that training supervised systems with PropBank’s semantic roles for shallow semantic analysis yields good results (see CoNLL 2005 and 2008). PropBank currently includes four language projects: English, Chinese, Hindi/Urdu, and Arabic.

We currently have two annotation tools that have been used in several different universities: a PropBank annotation tool,��Jubilee, and a PropBank Frame File editor,��Cornerstone. Both tools are available through��as open source projects.

��Funded by GALE, NIH, and HHS��
��Funded by GALE��
��Funded by the NSF��
Arabic PropBank Project��Funded by GALE

Funded by GALE and NSF

Word sense ambiguity is a continuing major obstacle to accurate information extraction, summarization and machine translation. While WordNet has been an important resource in this area, the subtle fine-grained sense distinctions in it have not lent themselves to high agreement between human annotators or high automatic tagging performance. Building on results in grouping fine-grained WordNet senses into more coarse-grained senses that led to improved inter-annotator agreement (ITA) and system performance (Palmer et al., 2004; Palmer et al., 2006), we have developed a process for rapid sense inventory creation and annotation that also provides critical links between the grouped word senses and the Omega ontology.

��Funded by GALE

The first level of OntoNotes analysis will capture the syntactic structure of the text, following the approach taken in the Penn Treebank. The Penn Treebank project, which began in 1989, has produced over three million words of skeletally parsed text from various genres. Among many other uses, the one million word corpus of English Wall Street Journal text included in Treebank-2 has fueled widespread and productive research efforts to improve the performance of statistical parsing engines. Treebanking efforts following the same general approach have also more recently been applied to other languages, including Chinese and Arabic.

The Penn treebanking approach has been ported to babyֱ��app, where we have recently finished��and are currently treebanking clinical notes for the Medical Informatics projects.

Clinical annotation (��and��)

Incorporating the findings of the above efforts, the��and��projects are developing semantic annotations in the clinical domain for materials such as radiology and pathology notes. The following annotation guidelines are being developed in these projects:��
��

��

babyֱ��app

Search

Other ways to search:

Annotation

Funded by GALE and NSF

Clinical annotation (��and��)