信息抽取相关词语定义

取自 自然语言处理百科

跳转到: 导航, 搜索
  • Attribute: a property of an entity such as its name, alias, descriptor, or type
  • Annotation: mark up of a text span in a specific format that indicates a feature or features of the text within the span
  • Benchmark: assessment of performance according to standard measures
  • Data: textual input for an information extraction system
  • Dataset: a set of newswire texts chosen according to pre-specified conditions and meant to represent a rich text stream
  • Database: data in tabular format stored with the assistance of a relational database management system
  • Developer: a researcher who implements a system
  • Dry Run: an end-to-end practice run of an evaluation
  • Entity: an object of interest such as a person or organization
  • Evaluation: assessment of performance according to agreed upon measures
  • Event: an activity or occurrence of interest such as a terrorist act or an airline crash
  • Fact: a relationship held between two or more entities
  • Formal Test Material: a blind dataset, task definitions, test procedure, answer keys, and scoring software
  • Formal Run: the "official" evaluation
  • Information Extraction: the extraction or pulling out of pertinent information from large volumes of texts
  • Information Extraction Systems: an automated system to extract pertinent information from large volumes of text
  • Information Extraction Technologies: techniques used to automatically extract specified information from text
  • Metrics: pre-defined measures of performance calculable by comparison of system output with human-generated answer keys
  • MUC: Message Understanding Conference held at the end of the evaluation and attended only by participants and invited potential customers
  • Named Entity: a named object of interest such as a person, organization, or location
  • SAIC: Science Applications International Corporation
  • Scoring Software: fully automated software for the comparison of system performance against answer keys that tallies and reports metrics and error types for developers and evaluators
  • Search Engine: software which gives relevance rankings to documents in a collection based on a user query
  • Sources of News: edited electronic feeds from established news organizations such as the Wall Street Journal and the New York Times News Service
  • Statistical Algorithm: algorithm to determine the statistical significance of evaluation results
  • Systems Integration: building a system from off-the-shelf components to accomplish a job previously not automated
  • Systems Integrator: builder of a system from off-the-shelf components
  • Task Definition: document which defines the format and criteria for annotation or extraction of text and placement into a database or template. For example, task definitions give general guidelines and examples for the extraction of named entities, attributes, facts, and events from texts.
  • Text: electronically encoded alphabetic material from some human language
  • Training: process by which a system learns about a dataset


转自:http://www.dmresearch.net/research/xinxichouqu/200412/2573.html

个人工具
工具箱