信息抽取相关词语定义
取自 自然语言处理百科
- Attribute: a property of an entity such as its name, alias, descriptor, or type
- Annotation: mark up of a text span in a specific format that indicates a feature or features of the text within the span
- Benchmark: assessment of performance according to standard measures
- Data: textual input for an information extraction system
- Dataset: a set of newswire texts chosen according to pre-specified conditions and meant to represent a rich text stream
- Database: data in tabular format stored with the assistance of a relational database management system
- Developer: a researcher who implements a system
- Dry Run: an end-to-end practice run of an evaluation
- Entity: an object of interest such as a person or organization
- Evaluation: assessment of performance according to agreed upon measures
- Event: an activity or occurrence of interest such as a terrorist act or an airline crash
- Fact: a relationship held between two or more entities
- Formal Test Material: a blind dataset, task definitions, test procedure, answer keys, and scoring software
- Formal Run: the "official" evaluation
- Information Extraction: the extraction or pulling out of pertinent information from large volumes of texts
- Information Extraction Systems: an automated system to extract pertinent information from large volumes of text
- Information Extraction Technologies: techniques used to automatically extract specified information from text
- Metrics: pre-defined measures of performance calculable by comparison of system output with human-generated answer keys
- MUC: Message Understanding Conference held at the end of the evaluation and attended only by participants and invited potential customers
- Named Entity: a named object of interest such as a person, organization, or location
- SAIC: Science Applications International Corporation
- Scoring Software: fully automated software for the comparison of system performance against answer keys that tallies and reports metrics and error types for developers and evaluators
- Search Engine: software which gives relevance rankings to documents in a collection based on a user query
- Sources of News: edited electronic feeds from established news organizations such as the Wall Street Journal and the New York Times News Service
- Statistical Algorithm: algorithm to determine the statistical significance of evaluation results
- Systems Integration: building a system from off-the-shelf components to accomplish a job previously not automated
- Systems Integrator: builder of a system from off-the-shelf components
- Task Definition: document which defines the format and criteria for annotation or extraction of text and placement into a database or template. For example, task definitions give general guidelines and examples for the extraction of named entities, attributes, facts, and events from texts.
- Text: electronically encoded alphabetic material from some human language
- Training: process by which a system learns about a dataset
转自:http://www.dmresearch.net/research/xinxichouqu/200412/2573.html

