Study Lemur

To try the sample data

Go to the data directory, run "test_key_index.sh" which is a self-explanatory shell script that will build an index, run several retrieval algorithms with some sample parameter files, and then evaluate the retrieval performance.

 

blog it


clipped from: www.lemurproject.org   

1. Browsing an index

This example prints out all entries in the term-to-document index and
document-to-term index of a collection with a table of contents (toc)
file named "index-file" built using the basic indexer.

Index *ind;  

ind = IndexManager::openIndex("index-toc-file.key");
// open the index specified by the table-of-content (toc) file "index-toc-file.key"
// IndexManager recognizes the suffix of the toc file (in this case ".key") and uses it to
// infer the actual type of the index to be opened (KeyfileIncIndex).


// first, browse through the term->document index (i.e., the inverted index)

int termID;

// iterate over all possible termID's, the termCountUnique() function
// gives the total count of unique terms, i.e., the vocabulary size.
// Note that the term index 0 is reserved for out-of-vocabulary
// terms, so we start from 1.

for (termID = 1; termID <= ind->termCountUnique(); termID++) {

cout << "term->document index entries for term : "
<< ind->term(termID) << endl;
// The function call term(termID) returns the string form of the term.


// now fetch doc info list for each term, this creates an
// instance of DocInfoList, which needs to be deleted later!
DocInfoList *docList = ind->docInfoList(termID);

// iterate over entries in docList
docList->startIteration();
DocInfo *dEntry;
while (docList->hasMore()) {
dEntry = docList->nextEntry();
// note that nextEntry() does *not* return an instance,
// instead, it passes out a pointer to a local static variable.
// so no "delete" is needed.

// print out this entry
cout << "-> " << dEntry->termCount() << " times in doc "
<< ind->document(dEntry->docID()) << endl;
}
delete docList; // note that you MUST delete docList!
}

U can browse the index/document sets by other prog. lang. See below links :
http://www.lemurproject.org/docs/index.php/Main_Page#Programming_with_the_Toolkit

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: