Extract all email addresses from your GMail IMAP

I wanted to extract all the email addresses in my GMail. Sadly, GMail does not have this facility, and allows to export only the addresses you have written emails to. So I tried http://vallery.net/gmail/ about a week ago, but am yet to get the results!! So I decided to dirty my hands over the weekend.

I tried some Perl modules, but the installation was not clean (needs 'make', failed tests, and what not!!!). So I moved to PHP. I installed php5-cli and php5-imap on my Ubuntu, added the line
extension=imap.so
to /etc/php5/cli/php.ini

And then it was just a matter of playing around with the API described at http://in2.php.net/imap

This script scans all the emails in the 'All Mail' label of GMail (which includes all the emails in your account, even archived, bit not the Spam and Trash labels). For each mail, it extracts the TO, CC, BCC, etc. fields (all those fields which may contain an email address) and prints the output on the screen in the following format:


SENDER& <timestamp>& <email_address>& <name>
TO& <timestamp>& <email_address>& <name>
REPLY-TO& <timestamp>& <email_address>& <name>


The & is the separator of the different fields. First field shows which part of the email the address was extracted from. The second field shows the timestamp of the email message. Third field shows the email address and the last field shows the name as associated with the email.

For eg. if the email was sent like:


From: Bob Lee <bob.lee@ggmail.com>
To: Jack Lee <lee.jack@ggmail.com>


then, the extracted emails will look like:


TO& Sun, January 23, 2008& lee.jack@ggmail.com& Jack Lee
FROM& Sun, January 23, 2008& bob.lee@ggmail.com& Bob Lee


Note that, the standards are quite flexible, so only the first two fields are guaranteed to be presented, rest of the fields can be empty.

You can take the output of this script an process it any which way you wish. You can use the output to determine who you have talked to most, or what was the frequency of your conversations with a person, etc. For eg. I run the output file through a series of tools to get all unique addresses like so:


cat extracted_emails.csv | cut -d '&' -f 3 | grep @ | sort | uniq > all_emails_sort_uniq.txt


The script can be customized for other IMAP provider/accounts/folder by changing the first 6 lines of code in the script (after the license strip).

Caveats:
.) You should have IMAP enabled in your GMail > Settings > Forwarding and POP/IMAP section.
.) If the last two lines of the output seem like:


Warning: imap_headerinfo(): Bad message number in mine_emails_addrs_from_imap.php on line 31
empty header found at 18109


that means the extraction is complete.

But, if you see only the 'empty header...' line, that means the connection was broken, or something happened so the extraction was not completed. You need to pick the number in the last line (18109 in this case) and provide that to the script as it's only argument, so the script will start from that message, and not redo the whole thing (which may cause it to fail again somewhere). You need to repeat this until you are able to see the WARNING message in last-but-one line.

Using this script, I am thinking of providing a service similar to vallery.net's, but with more transparency and better response times. Let me know if you really need it.

Finally, here's the script:


#!/usr/bin/php
# Distributed under GPLv3 License, as published on
# http://www.opensource.org/licenses/gpl-3.0.html
# with the following substitutions:
#
# <AUTHOR> = Gurjeet Singh
# <YEAR> = 2008

$options = '/imap/ssl/novalidate-cert';

$user = 'singh.gurjeet@ggmail.com';
$password = 'xxxxxxxxxx';

$mailbox_string = '{imap.gmail.com:993/imap/ssl/novalidate-cert}[Gmail]/All Mail';

echo "Connecting...\n";

$mbox = imap_open ( $mailbox_string, $user, $password )
or die("can't connect: " . imap_last_error());

echo "Fetching headers...\n";

if( $_SERVER["argc"] > 1 )
{
$i = $_SERVER["argv"][1];
}
else
{
$i = 0;
}

for( ++$i )
{
#if( $i % 100 == 0 ){ sleep( 1 ); }

$h = imap_headerinfo( $mbox, $i + 1 );

if( empty( $h ) )
{
echo "empty header found at $i\n";
break;
}

for( $j = 0; $j <>to ); ++$j )
{
echo 'TO& ' . $h->date . '& ' . $h->to[$j]->mailbox . '@' . $h->to[$j]->host . '& ' . $h->to[$j]->personal . "\n";
}

for( $j = 0; $j <>from ); ++$j )
{
echo 'FROM& ' . $h->date . '& ' . $h->from[$j]->mailbox . '@' . $h->from[$j]->host . '& ' . $h->from[$j]->personal . "\n";
}

for( $j = 0; $j <>cc ); ++$j )
{
echo 'CC& ' . $h->date . '& ' . $h->cc[$j]->mailbox . '@' . $h->cc[$j]->host . '& ' . $h->cc[$j]->personal . "\n";
}

for( $j = 0; $j <>bcc ); ++$j )
{
echo 'BCC& ' . $h->date . '& ' . $h->bcc[$j]->mailbox . '@' . $h->bcc[$j]->host . '& ' . $h->bcc[$j]->personal . "\n";
}

for( $j = 0; $j <>reply_to ); ++$j )
{
echo 'REPLY_TO& ' . $h->date . '& ' . $h->reply_to[$j]->mailbox . '@' . $h->reply_to[$j]->host . '& ' . $h->reply_to[$j]->personal . "\n";
}

for( $j = 0; $j <>sender ); ++$j )
{
echo 'SENDER& ' . $h->date . '& ' . $h->sender[$j]->mailbox . '@' . $h->sender[$j]->host . '& ' . $h->sender[$j]->personal . "\n";
}

for( $j = 0; $j <>return_path ); ++$j )
{
echo 'RETURN_PATH& ' . $h->date . '& ' . $h->return_path[$j]->mailbox . '@' . $h->return_path[$j]->host . '& ' . $h->return_path[$j]->personal . "\n";
}

}

if( !imap_close( $mbox ) )
{
echo "close ret'd: $ret";
}
?>

6 comments:

  1. I need to extract email addies in a label in gmail - please email me the solution

    jsack33@gmail.com

    ReplyDelete
  2. Very knowledgeable article thanks keep sharing such kind blog Gmail Email Extractor

    ReplyDelete
  3. I Find it very informative about marketing.Thanks for sharing such great information. hope you keep sharing such kind of information Internet email extractor software

    ReplyDelete
  4. Thanks for sharing this video ! Thanks for writing and sharing such an informative article. Keep sharing information like this.
    Gmail Email ID Extractor

    ReplyDelete
  5. It is very informative blog.Thanks for providing this blog. We have got so much information.
    Gmail Email Attachment Downloader

    ReplyDelete
  6. Good job, this article is well explained and so useful for us. Thanks for sharing with us Gmail Email Address Grabber.

    ReplyDelete