General architecture for the word processing

(Translated and adapted of Wikipedia, english language version)

SPOILS (General for Text Engineering Structures) is a software toolbox written in Java at the university of Sheffield (GB) as from 1995 and used very largely throughout the world by many communities (scientific, companies, teachers, students) for the treatment of the natural language in various languages. The community of developers and researchers around SPOILS is implied in several European research projects like CAT ( Transitioning Applications to Ontologies ) and SEKT ( Semantically Enabled Knowledge Technology ).

SPOIL offer an architecture, an application program interface of applications (API) and a graphic environment of programming.

SPOIL comprises a system of extraction of information, ANNIE ( has Nearly-New Information Extraction System , for quasi new system for the extraction of information), itself formed of modules among which a lexical analyzer, a gazetteer (?), a segmentor of sentences (with clarification), a etiquetor, a module of extraction of named entities and a module of detection of coréférences. The languages for which SPOILS is already implemented are English, Spanish, Chinese, Arabic, French, German, the Hindi, the Cebuano (?), Rumanian, Russian. There exists many plugins of machine learning (Weka, RASP, MAXENT, SVM light), others for the construction of ontologies (WordNet), for the interrogation of search engines like Google and Yahoo, for labelling (Brill, TreeTagger), etc

SPOIL accepts in entry various formats of text like the rough text, HTML, XML, Microsoft Word (Doc.), pdf, like various formats of databases like Java Serial (?), PostgreSQL, Lucene, Oracle, thanks to RDBMS and JDBC (?).

SPOIL also uses language JAPE ( Java Annotation Patterns Engine ) to build rules of annotation of documents. One finds also a debugger and tools of comparison of corpus and annotations.

References

  • Official site: group Natural Language Processing of the university of Sheffield

See too

Random links:River of Outaouais | Temperatura de color | Glossary of archeology/letter U | Automobile Grand Prix of Spain 2004 | Hemidactylus porbandarensis | Internal solar system | Station_de_Laurier_(OC_Transpo)