Unix – How to search for a word in a big file ?

The probelm.

I have done a tag-name statistics on a huge wikipedia data files and figure out that some tags are extremely useful while others are useless.
My next step: Look at the text contents of those important tags to see what text lying there!
My objective of doing this: I need to find the places inside a huge files where the text there is the most meaningful, which is a prerequisite to calculate P(w2|w) = P(w,w2)/P(w).

So, the problem is : search a tag name in a very huge file to see its context(the surrounding text) by naked eye, which will help to define the importance tags(by urself sense).

The solution.

Source :

  1. my Prof.Chevallet did a demo 1, 2 weeks ago and I saw him using ‘grep’ to seek text in a file.
  2. google:unix grep command -> Linux and UNIX grep command help

So what u need to do to search for the text is :
grep <seeking text> <filepath> -n -m <# of lines u wanna see> -C 1 –color
where :

  • -n
    will show line number
  • -m #
    will stop showing after # lines
  • –color
    help highlight matched terms
  • -C #
    show # of nearby(above & below) near the matched terms.

One example :
grep mySeeking myFile -n -m 11 -C 1 –color

Enjoy!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: