The proposed Electronic Document management System differ both in functionality and in technological decisions. However, to determine the value of any such system, it is enough just to answer three questions:
- How information is delivered to the system?
- How the information is indexed and stored?
- And, most importantly, how the search and retrieval of relevant information?
Depending on the answers to these questions all the existing document management systems can be divided into three categories reflecting and evolution of such systems.
Systems belonging to the first category (first generation), appeared in the mid 80s. The technology works with them is based on the use of keywords for indexing and retrieval of documents. In other words, after the document is scanned and received his graphic image, the image must be attributed to each document a set of keywords, which are then indexed, and are used for information retrieval.
Indexing keywords (or attribute indexing) is the most simple and cost-effective with regard to space technology. Its essence lies in the fact that for each input or stored document fill the appropriate fields in the index file. Filling is carried out either manually or by using a program allocating a document of any kind the key values / attributes. This technology allows you to index as text documents (manual and automatic modes) and image (in manual mode). In the simplest case, the key words are the name and / or name of the author of the document. In more complex situations it is necessary to hire an independent expert to read the document and highlight key words.
Severe restrictions on the use of these systems due to the following circumstances:
- Definition of key words - just a subjective process, even with the assistance of the independent expert is hard to avoid one-sidedness when choosing keywords.
- Definition of key words - just an expensive procedure (estimated at AIIM, the most influential organization in the systems associated with managing documents, it is 5 to $ 20 per document) due to inability to automatic indexing and low productivity in determining the keywords manually.
- It is assumed that users will search for information in a predictable way, using pre-defined keywords.
- Search by keywords - It is very useful implementation in document management system, this is a clear search - the user just needs to know that he is looking for. If you make a mistake when writing the keyword in the request for the search, the system will never find the right information.
- Keywords can change over time (concepts that were "key" yesterday, will not necessarily be as important in a year).
The technology works with the second generation of electronic document management system is shown in Fig. 4. If a document is entered into the system using a scanner, the graphical image is converted into a text file.
Searching for information in these systems is done using full-text search (Full Text Retrieval), which is implemented with the help of technology based on the inverted index of the matrix. The essence of the approach lies in the fact that when you create an index file (indexed array), it made all the important words (without conjunctions, prepositions, etc.) of all the documents in alphabetical order. These words are then combined into a pair of pointers to documents containing those words.
The principal technological innovation in systems, document management system second generation was the use of optical character recognition (Optical Character Recognition - OCR).
OCR - one of the main components of most modern document management systems, especially those in which the great role played by the input text. While modern technology allows OCR reliably identify high-quality paper documents, they can not guarantee absolute security. Therefore, in the process of OCR included manual editing process, in which the source code is compared with the received file. Around the issue of fine-tuning, correct and re-enter the text, past recognition, a whole industry has grown.