By Gerard Salton
Provides a concept of indexing able to rating index phrases, or topic identifiers in lowering order of value. This results in the alternative of excellent record representations, and likewise debts for the function of words and of glossary periods within the indexing strategy.
This research is normal of theoretical paintings in automated info association and retrieval, in that recommendations are used from arithmetic, computing device technological know-how, and linguistics. an entire concept of info retrieval might emerge from a suitable blend of those 3 disciplines.
Read or Download A Theory of Indexing PDF
Best probability books
Facing Uncertainties is an cutting edge monograph that lays targeted emphasis at the deductive method of uncertainties and at the form of uncertainty distributions. this angle has the potential of facing the uncertainty of a unmarried facts aspect and with units of knowledge that experience various weights.
Inefficient, overstaffed and detached to the public's wishes, the Soviet monetary forms operates at the present time a lot because it did within the Thirties. In Restructuring the Soviet financial forms, Paul R. Gregory takes an inside of examine how the program works and why it has ordinarily been so proof against switch.
Additional info for A Theory of Indexing
Except where otherwise noted, the experimental results are based on the use of three collections of about 450 documents each in aerodynamics, biomedicine, and world affairs, respectively, denoted as CRAN, MED, and Time; twenty-four queries are used with each collection. While different subject areas are covered in each case, the relevance properties are identical for the three collections; in particular, the probability that a given document is relevant to a query is the same throughout the test base.
The curve closest to the upper-right-hand corner of the graph (where recall and precision are highest) reflects the best performance. It may be seen in Fig. 9 that the deletion of frequencyone terms and of terms with large document frequencies produces substantial increases in the average recall and precision values. FIG. 9. Performance of term deletion algorithm of Fig. 8; averages over 1033 documents and 35 queries (adapted from ). Additional reductions in the indexing vocabulary may be effected by further deletion of terms in increasing term value order.
The transformation illustrated in Fig. 12 may be generalized by using larger term groups (phrases with more than two components), obtained for example through an automatic term clustering process. These phrases can then be assigned FIG. 12. Illustration for generation of low frequency term combinations. A THEORY OF INDEXING 45 to documents and queries whenever the corresponding components are present in addition to, or instead of, the original high-frequency terms. , for certain cooccurring terms.
A Theory of Indexing by Gerard Salton