[ILUG-BOM] Re: Duplicates in a txt file using perl

Philip S Tellis philip.tellis@[EMAIL-PROTECTED]
Tue Oct 9 00:44:03 IST 2001


Sometime on Oct 8, Vikram Ojha assembled some asciibets to say:

> i have attended regex seminar with Dinesh shah and other collegues
> there u said abt finding duplicates in a text file

Ok, before I answer your question, I must ask you to please not include
irrelevant mails in your posts.  You included my mail about X crashing,
which is totally irrelevant here.

Now, to your question.  This should work, using grep to check words on
one line:

grep -e "\<\([[:alpha:]]+\)\>[^[:alpha:]]+\1"

if you need to check for words across lines, then use sed:

<untested>
sed -ne "h;n;x;G;/regex/p;x;"
</untested>

This may require some additions, but I'll have to actually try it to
know for sure.

In perl, just slurp the entire file ($/=undef), and do a single line
match (m//s)

Philip

-- 
Quantum Mechanics is God's version of "Trust me."


Visit my webpage at http://www.ncst.ernet.in/~philip/
Read my writings at http://www.ncst.ernet.in/~philip/writings/

  MSN  philiptellis                         Yahoo!  philiptellis
  AIM  philiptellis                         ICQ     129711328





More information about the Linuxers mailing list