Using puppet realm switch to select between beta/prod ( Wikimedia clusters )

Since the BounceHandler extension is currently installed only in the beta clusters ( Official testing servers of Wikimedia- deployment.wikimedia.beta.wmflabs.org ), writing a custom router in the exim configs of operations/puppet ( configuration repo managed by puppet ) to collect in all the bounce emails and HTTP POST to the extension API seemed risky. This was due to the fact that operations/puppet is meant for production and something beta specific in it, that too continuous API requests dont look sane ( we cant estimate the traffic yet ).
Marius Hoch ( WMF ) came with the idea of an exim-realm switch, which seemed to solve the issue. You can switch variables between beta and production by:
We edited

$ sudo <editor> templates/exim/manifests/mail.pp

and added this before the inclusion of exim4.conf.SMTP_IMAP_MM.erb.

# config - labs vs. production 
case $::realm {
    'labs' : {
	$verp_post_connect_server = 'deployment.wikimedia.beta.wmflabs.org'
	$verp_bounce_post_url = 'http://deployment.wikimedia.beta.wmflabs.org/w/api.php'
    }
    'production': {
	$verp_post_connect_server = 'appservers.svc."${::mw_primary}".wmnet'
	$verp_bounce_post_url = 'http://meta.wikimedia.org/w/api.php'
    }
    default: {
	fail('unknown realm, should be labs or production')
    }
}

The internal networks selection was done to make sure the bounce gets POSTed to the right wiki. Now, this configuration can be used in exim4.conf.SMTP_IMAP_MM.erb by editing the bouncehandler router by:

command = /usr/bin/curl -H 'Host: <%= @verp_post_connect_server %>' <%= @verp_bounce_post_url %> -d "action=bouncehandler" --data-urlencode "email@-"

Hope it helps someone in beta 🙂

Using Plancake::MailParse library to strip email out of its headers

Due to some issues with composer loading we had to shift our mail parse library from pear::mimeDecode to a more good looking email parse library by Plancake. Just quoting down how we will employ the library to extract headers from an email:
Add the dependancy to your composer.json

    "require":
    {
        "floriansemm/official-library-php-email-parser": "dev-master"
    }

Now in the mail decode class, add this to perform the extraction:

/**
  * Extract headers from the received bounce email using Plancake mail parser
  *
  * @param string $email
  * @return array $emailHeaders.
  */
public function extractHeaders( $email ) {
	$emailHeaders = array();
	$decoder = new PlancakeEmailParser( $email );

	$emailHeaders[ 'to' ] = $decoder->getHeader( 'To' );
	$emailHeaders[ 'subject' ] = $decoder->getSubject();
	$emailHeaders[ 'date' ] = $decoder->getHeader( 'Date' );
	$emailHeaders[ 'x-failed-recipients' ] = $decoder->getHeader( 'X-Failed-Recipients' );

	return $emailHeaders;
}

Now you can have the headers ready in $emailHeaders 🙂 yay!
* The library has no other dependencies and looks neat as a whole. The code can be found here :https://github.com/plancake/official-library-php-email-parser ! Enjoy ! Happy Hacking!

Using PEAR::mimeDecode to strip email bounce out of its headers

Writing indigenous email header stripping functions involve tedious work and a lot of regex, as the bounce headers email can be encoded. up/down cased, and saving it, we planned to include the mailMimeDecode class – a pretty straightforward approach. Header stripping is quite easy, and it can be installed via composer too.
* Prepare your composer.json

{
    "require":
    {
        "pear/mail_mime-decode": "1.5.5",
        "pear/pear_exception": "1.0.x-dev"
    }
}

* Now, in the mail decode class, add the following to initialize the mimeDecode object

<?php
	// mimeDecode configurations
	$params['include_bodies'] = true;
	$params['decode_bodies'] = true;
	$params['decode_headers'] = true;

	$decoder = new Mail_mimeDecode( $email );
	$structure = $decoder->decode( $params );

	$emailHeaders = $structure->headers;
	
	$to = $emailHeaders[ 'to' ];
	$subject = $emailHeaders[ 'subject' ];
	$emailDate = $emailHeaders[ 'date' ];
	$permanentFailure = $emailHeaders[ 'x-failed-recipients' ];
?>

Now, you have the To, Subject, Date and X-Failed Recipients headers ready. Please note that, you need to just put the lower-cased default email header-name as ‘key’ in the $emailHeaders array, to fetch it.
Yay! Happy Hacking

Writing a Job queue to deal with load when POST-ing from exim to MediaWiki API

Last day, Tim Landscheidt from Wikimedia scribbled on my earlier post that I should use a job queue to handle load of the bounce handling API. I talked with Legoktm on this, and he said it was a great idea, as there can be a chance of multiple email bounces reaching the API simultaneously. I will jot down how we made that happen.
*Firstly, register and load the Job handler class

//Register and Load Jobs
$wgAutoloadClasses['BounceHandlerJob'] = $dir. '/includes/job/BounceHandlerJob.php';
$wgJobClasses['BounceHandlerJob'] = 'BounceHandlerJob';

* Now, create the file BounceHandlerJob.php extending class BounceHandlerJob from Job.
I wanted to get the $email, which will be passed from ApiBounceHandler::exectue();

class BounceHandlerJob extends Job {
	public function __construct( Title $title, array $params ) {
		parent::__construct( 'BounceHandlerJob', $title, $params );
	}

	/**
	 * Queue Some more jobs
	 * @return bool
	 */
	public function run() {
		$email = $this->params[ 'email' ];

		if ( $email ) {
			// The function in the API where the header 
			// stripping and other stuff happen
			ApiBounceHandler::processEmail( $email );
		}

		return true;
	}
}

* Now, we need to make the APIBounceHandler class to receive the POST request, and create a Job queue object so that our objective is accomplished.

$email = $this->getMain()->getVal( 'email' );
$params = array ( 'email' => $email );
$title = Title::newFromText( 'BounceHandler Job' );
$job = new BounceHandlerJob( $title, $params );
JobQueueGroup::singleton()->push( $job );

Yay ! done. now the public static processEMAIL function needs to be defined with the necessary actions and you are good to go.
PS: Please add the following to LocalSettings.php to see the results.

$wgRunJobRate = 0;

To ensure things are working well, run the script

php runJobs.php

. Correct errors if any, and the perfect run will output something like.

2014-07-13 15:02:38 BounceHandlerJob BounceHandler_Job email=string(1862) STARTING
2014-07-13 15:02:38 BounceHandlerJob BounceHandler_Job email=string(1862) t=100 good

. Thanks