AntispamLab(TM) - a tool for testing email spam filters * Copyright 2007-2009, Laboratory for Communications and Applications (LCA), * Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland. * All rights reserved. * * Contact address: slavisa.sarafijanovic@epfl.ch, jean-yves.leboudec@epfl.ch * * This file is part of AntispamLab(TM). AntispamLab(TM) is free open source * software distributed under GPL license for non-commercial use only; you * can redistribute it and/or modify it under terms of GNU General Public * License (as published by the Free Software Foundation) for non-commercial * use only. Another option is to contact us to purchase a commercial * license. * * AntispamLab(TM) is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * for more details. * * You should have received a copy of the GNU General Public License along * with AntispamLab(tm); if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA *** THE AUTHORS: Slavisa Sarafijanovic Luis Hernandez Raphael Naefen Jean-Yves Le Boudec *** ABOUT THE TOOL AntispamLab(TM) tool is a software package for testing email spam filters. The tool uses a network of machines (such as Planet Lab) to deploy multiple email servers, simulated email clients and users, simulated spammers, and spam-filters. The numbers of email servers, email-users per server and spammers are configurable. The complete deployement happens upon executing only one command from one machine to which the tool is previously downloaded. After the deployment, a completely automated "test" may be lunched (again using just one command) that consists of multiple independant "runs" of the complete system. Within one run, users and spammers are instructed to perform their activities (such are send, receive, delete-as-spam emails) for a configurable amount of time. After each run, the data about the number of true and false positives observed by each user is collected and logged. After all the runs are completed, all the logged data is parsed and the mean values of the true positives and false positives, along with 95% confidence intervals, are computed and displayed to the screen from which the test has been lunched. The number of the runs impacts the size of the obtained confidence intervals, and is preferably 20 or more. The tool uses a time scale mechanism that enables the execution time of one run to be much smaller then the configured simulated-system time of the run. The execution time is approximately "time scale factor" times shorter then the simulated-system time. Time scale factor is a configurable parameter. If it is set too high the test may exit giving the error message (a typical value that worked well for us is 60). *** USAGE REQUIREMENTS To use the AntispamLab(TM) tool (software) you need a slice in Planet Lab (www.planet-lab.org), or another network of real or virtual machines of which each machine has: Fedora Core 4 operating system (other systems untested with the latest tool version); Python, version or later (Python provides backward compatibility); SSH v2 (v1 should work but untested); rsync program (sftp to planetlab is broken); sudo (working without passwords, i.e. command "sudo python set.py" must work); rpm installer ("rpm -i whatever.rpm" must work); Internet connection. *** USAGE INSTRUCTIONS Download antispamlab.tar.gz to the master machine (if you use Planet Lab, use a Planet Lab machine as the master machine, as your local machine might not meet the above mentioned requirements). Unpack it (tar -xzvf antispamlab.tar.gz). If you use Planet Lab, place your Planet Lab RSA keys (id_rsa and id_rsa.pub) in the same folder where you unpacked antispamlab.tar.gz. Copy and paste the list of nodes of your slices/network to a file named machines.txt, and put this file in the same folder where you unpacked antispamlab.tar.gz. The file machines.txt must contain one hostname per line. The tool comes with a default corpus of spam and ham emails. The corpus consists of two folders (ham and spam) gzipped into a file named. The emails are taken from the SpamAssassin public corpus (http://spamassassin.apache.org/publiccorpus/). To use another corpus, replace the file spamham.tar.gz with a similar one that contains a ham and a spam folder. Bear in mind that each email must be in a separate file inside these folders. The tool comes with a simple example-filter. Running the tool with the provided example-filter helps you to find whether the tool works well for you (the expected results are known for the example-filter). In order to replace the provided example-filter by a filter you want to test, replace the two files filter.tar.gz and filter_install.sh by the two files corresponding to your desired filter (more details below). The tool enables you to supply your filter with a feedback from the users it protects. In the default configuration, the feedback is only simply logged by a logger program that runs on those machines on which the filter is installed. If you want to turn off the logger and feed your filter with these feedbacks, read the section "USER-FEEDBACK LOGGER CHANGE" (below) and follow the instructions you find there. Change any configuration values you wish in setupvalues.cfg and defaults.cfg. Be sure to check those marked as IMPORTANT. Deploy the system specified in setupvalues.cfg: position into the directory where you unpacked antispamlab.tar.gz and execute the command "python build.py". Wait until the python script does its job. If everything goes well, when the script stops you should see a list of deployed systems (email servers, email clients-users, spammers and filters). Lunch a test: execute "python manyruns.py x y", where x is the number of runs to perform and y is the starting seed for random number generators used within the tool. Wait for the test results: they are printed to the screen in form of the means and confidence intervals for the observed true positives and false positives. Also, xml logs are saved to the folder from which the test is lunched. Make sure to remove all logs (by default named globalLog*.xml) before re-invoking manyruns.py. If you want to forget about the current deployment and deploy a different setup, type "python reuse.py". Then change any parameters you wish in setupvalues.cfg and run "python setsystem.py -y". *** FILTER CHANGE The following two tool files are used for installing the tested filter instances when the testing system is deployed: - filter.tar.gz, a gzipped file which contains all needed installation files. It will be uploaded to its corresponding machine and then unpacked automatically. - filter_install.sh, a shell script which contains all needed commands to install the filter (and any software packages specific to it). It will be run with root privileges. If the default filter.tar.gz and filter_install.sh are used that are provided within antispamlab.tar.gz, the example-filter will be installed and tested. To test another filter, create filter.tar.gz and filter_install.sh that correspond to that filter and put them into the directory where you unzipped filter.tar.tgz (replace of the default filter.tar.gz and filter_install.sh files). Additionally, there are several configuration values that are specific to each filter and that you may need to change: - In setupvalues.cfg: (default config shown) filtertype=PIPE #Change to SMTP if you don't want procmail + pipe usage filtercommand=python /usr/local/pipefilter.py #what program must be run by the pipe filter_reset=NONE #specify a shell command to reset the filter at the beginning of each test, NONE if reset is not needed filterport=12125 #only needed for SMTP filters, port from which receive all mail to filter - In defaults.cfg: [filter] DASKeyword = X-Spam-Status: Yes #specify here the header and value that your filter is going to use to mark spam. Procmail needs this info. *** USER-FEEDBACK LOGGER CHANGE The tool enables you to supply your filter with a feedback from the users it protects. In the default configuration, the feedback is only simply logged by a logger program that runs on those machines on which the filter is installed. The tool uses a patched version of Dovecot (open-source IMAP server) wich reports to the tested filter copy-operations performed between mailboxes. This report consists of writing two lines of text (the Message-Id header of the copied mail and the recipient of it) into a local socket. The socket port is specified in setupvalues.cfg via the dasport parameter. In order to receive the feedback signals, a program (ds_listener.py) is uploaded to the machines on which the filters are deployed, and executed at the beginning of each run of the testing system. By default, ds_listener.py is executed via "python ds_listener.py &". If you want to specify a different listener and a different command, change in setupvalues.cfg the values of dslistener (specify the file) and listener_cmd (specify the command). Make sure to place the specified file in the same directory where you unpacked antispamlab.tar.gz. Most probably you want the feedback to be received direclty by your tested filter, which means that you should leave the listener_cmd parameter empty, make sure your filter is listening to a fixed UDP port number, and set the value of the dasport parameter to that port number. Remark: in the current implementation only the copy-operations feedback is implemented, and the only copy opperations that are performed by users are those related to deleting emails as spam. You might want to change the syntax of the copy-operations feedback (for example provide "from" and "to" folders), or to implement some other feedback and syntax specific to your filter. For this you need to make changes in the source code of the tool.