I have done a tag-name statistics on a huge wikipedia data files and figure out that some tags are extremely useful while others are useless.
My next step: Look at the text contents of those important tags to see what text lying there!
My objective of doing this: I need to find the places inside a huge files where the text there is the most meaningful, which is a prerequisite to calculate P(w2|w) = P(w,w2)/P(w).
So, the problem is : search a tag name in a very huge file to see its context(the surrounding text) by naked eye, which will help to define the importance tags(by urself sense).
- my Prof.Chevallet did a demo 1, 2 weeks ago and I saw him using ‘grep’ to seek text in a file.
- google:unix grep command -> Linux and UNIX grep command help
So what u need to do to search for the text is :
grep <seeking text> <filepath> -n -m <# of lines u wanna see> -C 1 –color
will show line number
- -m #
will stop showing after # lines
help highlight matched terms
- -C #
show # of nearby(above & below) near the matched terms.
One example :
grep mySeeking myFile -n -m 11 -C 1 –color