Regular Expression in Perl – Extract tag name in tagged content

The below script is what I’ve done to extract tags from tagged content.
U need to know how to do the following with regex in Perl :

  • turn off the greedy by using ? afterward
  • extract matched text by using parentheses ( and )
  • get the sub text that match the regex
  • get the remain text following the matched one.

The console-print version :

#! /user/bin/perl
my $s =
"<tag>
    <tagA>aaa</tagA>
    <tagB>bbb<tagB>
</tag>" ;

my $regex = "<([^//].+?)>" ;

while (true) {
    if ($s =~ /$regex/) {
        print $1."\n" ;
        print $&."\n" ;
        print "…. last: …….."."\n" ;
        $s = $’ ;
        print $s."\n" ;
        print "——————-"."\n" ;
    } else {
        last ;
    }
}

print "\n" ;

The module-wrapped version :

#! /user/bin/perl

##main program
my $s =
"<tag>
    <tagA>aaa</tagA>
    <tagB>bbb<tagB>
</tag>" ;

my %arTags = extractTag($s) ;

foreach $tag (keys %arTags) {
    print $tag."\n" ;
}

##sub define
sub extractTag {
    my $s=shift ;
    my %arTags ;
   
    my $regex = "<([^//].+?)>" ;
    while (true) {
        if ($s =~ /$regex/) {
            $arTags{$1}++ ;#storing the extracted tag       
            $s = $’ ;#the remaining string
        } else {
            last ;
        }
    }
    return %arTags ;
}

Enjoy the tags!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: