The proposed Electronic Document management System differ both in functionality and
in technological decisions. However, to determine the value of any such system,
it is enough just to answer three questions:
- How information is delivered to the system?
- How the information is indexed and stored?
- And, most importantly, how the search and retrieval of
relevant information?
Depending on the answers to these questions all the existing
document management systems can be divided into three categories reflecting and
evolution of such systems.
Systems belonging to the first category (first generation),
appeared in the mid 80s. The technology works with them is based on the use of
keywords for indexing and retrieval of documents. In other words, after the
document is scanned and received his graphic image, the image must be
attributed to each document a set of keywords, which are then indexed, and are
used for information retrieval.
Indexing keywords (or attribute indexing) is the most simple
and cost-effective with regard to space technology. Its essence lies in the
fact that for each input or stored document fill the appropriate fields in the
index file. Filling is carried out either manually or by using a program
allocating a document of any kind the key values / attributes. This
technology allows you to index as text documents (manual and automatic modes)
and image (in manual mode). In the simplest case, the key words are the name
and / or name of the author of the document. In more complex situations it is
necessary to hire an independent expert to read the document and highlight key
words.
Severe restrictions on the use of these systems due to the
following circumstances:
- Definition of key words - just a subjective process, even
with the assistance of the independent expert is hard to avoid one-sidedness
when choosing keywords.
- Definition of key words - just an expensive procedure
(estimated at AIIM, the most influential organization in the systems associated
with managing documents, it is 5 to $ 20 per document) due to inability to
automatic indexing and low productivity in determining the keywords manually.
- It is assumed that users will search for information in a
predictable way, using pre-defined keywords.
- Search by keywords - It is very useful implementation in document management system, this is a clear search - the user
just needs to know that he is looking for. If you make a mistake when writing
the keyword in the request for the search, the system will never find the right
information.
- Keywords can change over time (concepts that were
"key" yesterday, will not necessarily be as important in a year).
The technology works with the second generation of electronic document management system is shown in Fig. 4. If a document is entered into the system using a
scanner, the graphical image is converted into a text file.
Searching for information in these systems is done using
full-text search (Full Text Retrieval), which is implemented with the help of
technology based on the inverted index of the matrix. The essence of the
approach lies in the fact that when you create an index file (indexed array),
it made all the important words (without conjunctions, prepositions, etc.) of
all the documents in alphabetical order. These words are then combined into a
pair of pointers to documents containing those words.
The principal technological innovation in systems, document management system second generation was the use of optical character recognition (Optical
Character Recognition - OCR).
OCR - one of the main components of most modern document
management systems, especially those in which the great role played by the
input text. While modern technology allows OCR reliably identify high-quality
paper documents, they can not guarantee absolute security. Therefore, in the
process of OCR included manual editing process, in which the source code is
compared with the received file. Around the issue of fine-tuning, correct and
re-enter the text, past recognition, a whole industry has grown.