$Id: CHANGELOG,v 1.462 2007/03/18 17:08:49 jonz Exp $ Version 3.8.0 ------------- [20061210.1435] jonz: fixed message corruption problems with direct delivery when using direct delivery (e.g. DeliveryHost), certain servers require a linefeed after carriage return otherwise the message will become malformatted. [20060818.0700] jonz: added msg tagging support added ability to add tagline to messages based on their classification; see tagSpam and tagNonspam preferences in README [20060607.1200] jonz: removed depricated oracle driver removed outdated oracle driver; no maintainer, lack of interest [20060606.0000] jonz: added ldap client to build added ldap client headers to makefile, would not build on some systems [20060601.0500] jonz: fix for dynamic storage drivers api fixed _ds_pref_del call to storage library [20060601.0300] jonz: webui history fix for 12:00 noon bugfix to display 12 noon as 12p, not 12a [20060530.0145] jonz: added connect check for pgsql added a connection check for pgsql, to reconnect on failure in daemon mode [20060530.0130] jonz: added logging of viruses added logging of viruses (and the source) to agent [20060527.1700] jonz: added HashPctIncrease option in dspam.conf HashPctIncrease: Increase the next extent size by n% from the size of the last extent. The default behavior, when HashPctIncrease is not used, is to always use HashExtentSize with no increase. This is useful in accommodating systems where the default HashExtentSize can be too small for certain high-volume users. [20060527.1530] jonz: cache runtime user information added caching of runtime user information, so this information is not polled every message when running in daemon mode. also elimiates the need for getpwuid_r when running in daemon mode (unless using mysql or pgsql), which some operating systems do not have. [20060527.1530] jonz: moved TIME_ME into DEBUG when debug is active, TIME_ME automatically runs, reporting processing time to debug [20060526.1900] jonz: fix for library TIME_ME measurements fixed bug where negative processing times were reported using TIME_ME [20060526.1600] jonz: turned off locking when not using syslog or logging no need to lock on LOG() when not logging [20060526.0230] jonz: rewrite for hash_drv offset caching rewrote offset caching in hash_drv; fixed some bugs which may have caused a crash on extent addition [20060525.1100] jonz: fix for segfault on undefined DeliveryHost or ClientHost fix for segfault in daemon mode when DeliveryHost or ClientHost is not specified [20060524.0300] jonz: added --client support for dspam_train use --client after username [20060523.0300] jonz: more code optimizations various optimizations to: - tokenizer core - hash_drv driver (store offset for writes) - libdspam (preference lookups) - optimizations for osb/sbph [20060522.0300] jonz: added ProcessorURLContext ProcessorURLContext creates Url* context-specific tokens for URLs; this is the default in previous (and current) versions [20060522.0300] jonz: optimized osb/sbph tokenizer replaced several strlcat's with simple len counting to eliminate thousands of unnecessary calls to strlen() and speed up osb/sbph tokenization process [20060519.0300] jonz: fix for segfault in vsyslog() fix segfault caused by bad use of va_args when vsyslog is called [20060519.0130] jonz: fix for segfault in dlopen() failure fixed bug causing segfault when dlopen() to storage driver library fails. dspam still won't work any better if dlopen is failing but huzzah. [20060519.0100] jonz: fix for performance template / local domain added fix to display correct local domain in performance template, and only display local domain if the username doesn't include an @ sign [20060517.0700] jonz: fix for preference delete fixed infinite loop on all non-preference-extension calls to delete a preference [20060516.0200] jonz: changed SupressWebStats SupressWebStats is now WebStats in dspam.conf, and setting is inverted. [20060516.0200] jonz: fix for agent flags discovered that agent flags required a 64-bit variable to hold all flags, but only 32-bit variable was being used; this may have caused unpredictable behavior when using SBPH, "unlearning" a message, or processing summaries. [20060516.0200] jonz: added OSB tokenizer osb (orthogonal sparse bigram) is similar to sbph, however only bigrams are used to form sparse tokens; this uses far fewer resources than sbph with very similar results [20060516.0200] jonz: interface change: added tokenizer variable added tokenizer variable to DSPAM_CTX and added following tokenizer flags: DSZ_WORD Use WORD (uniGram) tokenizer DSZ_CHAIN Use CHAIN (biGram) tokenizer DSZ_SBPH Use SBPH (Sparse BP Hashing) tokenizer DSZ_OSB Use OSB (Orthogonal Sparse biGram) WARNING: This is an API change and constitutes a new major version. Third party applications may fail to compile/run against this. [20060414.1145] jonz: fix for segfault on log write err when using --with-logfile, if file cannot be opened, dspam segfaulted [20060513.1100] jonz: fixed compiler warnings on sqlite drivers signed-ness warnings, nothing significant [20060514.0900] jonz: discontinued support for berkeley db deprecated bdb drivers finally removed from distribution [20060512.2105] jonz: copyright modifications reassignment to Jonathan Zdziarski instead of using my corporate face [20060512.2100] jonz: removed some legacy piecess - removed dspam_corpus (replaced by newer dspam_train) - removed dspam_genaliases (replaced by parse-to-headers, virtual users, etc) [20060512.0100] jonz: segfault fix for UIDInSignature fixed a critical bug that can cause segfaults when correcting messages using UIDInSignature options. database handle is refreshed, but new pointer is never used. [20060510.0800] jonz: fix to recognize trainPristine "off" in preferences preference turned "off" should override config turned "on" Version 3.6.8 ------------- [20060606.0000] jonz: fixes for pgsql_drv fixed bugs from last release causing pgsql to fail on connection [20060606.0000] jonz: added ldap_client headers to build some operating systems refused to build ldap client due to missing header in makefile Version 3.6.7 ------------- [20060602.2300] jonz: fix for UIDInSignature with groups fixed a bug causing the wrong uid to be written when UIDInSignature is used in conjunction with groups [20060530.0145] jonz: added connect check for pgsql added a connection check for pgsql, to reconnect on failure in daemon mode [20060530.1100] jonz: fix for incorrect reporting of X-DSPAM-Probability fixed a bug causing X-DSPAM-Probability to be misreported when using multiple algorithms [20060525.1100] jonz: fix for segfault on undefined DeliveryHost or ClientHost fix for segfault in daemon mode when DeliveryHost or ClientHost is not specified [20060519.0300] jonz: fix for segfault in vsyslog() fix segfault caused by bad use of va_args when vsyslog is called [20060519.0130] jonz: fix for segfault in dlopen() failure fixed bug causing segfault when dlopen() to storage driver library fails. dspam still won't work any better if dlopen is failing but huzzah. [20060517.0700] jonz: fix for preference delete fixed infinite loop on all non-preference-extension calls to delete a preference [20060516.0200] jonz: fix for agent flags discovered that agent flags required a 64-bit variable to hold all flags, but only 32-bit variable was being used; this may have caused unpredictable behavior when using SBPH, "unlearning" a message, or processing summaries. Version 3.6.6 ------------- [20060513.1100] jonz: fixed compiler warnings on sqlite drivers signed-ness warnings, nothing significant [20060514.0900] jonz: discontinued support for berkeley db deprecated bdb drivers finally removed from distribution [20060512.2105] jonz: copyright modifications reassignment to Jonathan Zdziarski instead of using my corporate face [20060512.2100] jonz: removed some legacy piecess - removed dspam_corpus (replaced by newer dspam_train) - removed dspam_genaliases (replaced by parse-to-headers, virtual users, etc) [20060512.0100] jonz: segfault fix for UIDInSignature fixed a critical bug that can cause segfaults when correcting messages using UIDInSignature options. database handle is refreshed, but new pointer is never used. [20060510.0800] jonz: fix to recognize trainPristine "off" in preferences preference turned "off" should override config turned "on" Version 3.6.5 ------------- [20060421.1645] jonz: do not quarantine when delivering summary bugfix to prevent quarantining of message when delivering summary [20060421.1630] jonz: pgsql performance enhancements improvements to purge scripts and object creation script [20060419.1300] jonz: admin graph fixes prevents carriage returns in subjects/fromlines from being written improves parsing of admin graphs to avoid "last day stackup" scenario [20060419.1200] jonz: webui patch Applied patch submitted by Stefan Huelswitt Using HTTP redirect to redirect the browser back to the original template after the user has executed a link. Doesn't allow the browser to show the URL location with the embedded command e.g. retrain=spam. Protects the user from accidental re-execute of the command by reloading the page. Changed DisplayHistory to scan the entire logfile first and only then decide which information has to be displayed based on history_page. Avoids wrong display due to incomplete information available. Discard 'resent' messages in history display. Quote single-quote ' (mostly from subject) in javascript command for fragment display. Prevents execution of the command. Added links to previous/next history page. Added 'history_page=1' to all templates for consistency. Apply MAX_COL_LEN to history display as well. Added DATE_FORMAT to configure.pl to allow customized date format in history and quarantine (using strftime). Added OPTMODE to configure.pl to customize preferences tab for OptIn, OptOut or no selectable option. Touching a mailbox timestamp file every time the user displays the quarantine. This file can be used in a report_quarantine script. Removed 'sortby=Rating' from all templates as it renders SORT_DEFAULT useless. In quarantine, keep selected sort method after processing mails. In GetPrefs, read default prefs first and overlay them with user prefs. [20060418.0830] jonz: dspam.cgi to use MAX_COL_LEN MAX_COL_LEN used for calculating column length in WebUI [20060418.0830] jonz: dspam_admin patch corrects the output of "dspam_admin aggr pref" [20060418.1435] jonz: bugfix for flat preference read fixed a bug causing writing of flat-file preferences to fail on some systems [20060418.1435] jonz: fix for segfault on clamav connect error fixed a bug where certain problems establishing connectivity to clamav can segfault dspam [20060412.0900] jonz: fix for segfault on empty username fixed a bug where a NULL username can sneak in and cause a segfault on strdup [20060331.0800] jonz: fix for ClamAV applied patch to fix clamav issues [20060324.0845] jonz: fragment overwrite bug fixed a bug where a fragment file is overwritten on retrain [20060324.0845] jonz: fixed invalid read/segfault dspam.c:3284 [20060222.0830] jonz: fixed segfault on bad configuration fixed a segfault which can occur if TrainingMode is not specified in dspam.conf [20060216.1545] jonz: added syslog and logfile flags added --disable-syslog function to turn off syslogging added --with-logfile= funciton to define a flat file for logging [20060215.1230] jonz: dspam_stats to total all stats displayed with -t dspam_stats now displays a total of all stats included in the original query when -t is used [20060215.1230] jonz: Markovian result used as X-DSPAM-Confidence X-DSPAM-Confidence is set using markovian result, whenever markovian pvalues are used. [20060215.1200] jonz: bugfix for dspamc and --deliver=summary fixed a bug causing --deliver=summary to return no output when used in dspamc [20060215.1200] jonz: support for read/write servers in mysql_drv added support for separate read/write servers to be used with mysql_drv. see dspam.conf for more information. Version 3.6.4 ------------- [20060211.1515] jonz: added index support for dspam_train added support for training using an index file to define the order of ham/spam by specifying dspam_train [username] -i [indexname]. format of index file is "class filename" where class can be spam/nonspam. [20060209.1930] jonz: cgi mass retraining patch applied mass retraining patch submitted by Cove Schneider [20060207.0400] jonz: cgi improvements - added Undo option to undo retraining - added support for existing storeFragments option to recall message in history [20060206.1430] jonz: documented user preference options documented all available user preferences in 2.5 of README [20060202.1630] jonz: added ClassAlias options added ClassAlias options to dspam.conf to alias spam/nonspam classes [20060202.1200] jonz: bugfix for segfault on UIDInSignature with bad UID fixed a bug which causes a segfault when using UIDInSignature if a bad uid is specified in the signature [20060131.0830] jonz: bugfix in --classify in client/server mode fixed a bug causing no output when using --classify in client/server mode [20060129.0000] jonz: dramatic reduction of token separators changed token separators in config.h, made noticeable improvement in accuracy across a few different corpora. old delimiters are still there if we need to change back. [20060124.0830] jonz: added dspam_train a true training and testing mechanism, useful for building pretrained databases or training a user with their own corpus. also provides a test jig for measuring efficiency/accuracy with a corpus over a configuration. [20060124.0830] jonz: fixes for dspam_corpus fixes for dspam_corpus: - uses default settings for features and training modes, instead of its own - now requires --spam or --nonspam arguments [20060124.0830] jonz: removed neural networking functions experimental, needed a rewrite, no support, and high maintenance [20060122.1700] jonz: more enhancements to accuracy - extended range for probabilities from .01/.99 to .0001/.9999 - if a single-corpus token would have a stronger probability with one hit than none, use the stronger probability [20060121.1835] jonz: code cleanup / performance improvements cleaned up text preprocessors (decoders and html scrubbers), avoided using repeated strlen() functions which were consuming around 25% of the total processing time. renamed _ds_message_block to _ds_message_part (no reason). [20060120.0500] jonz: packaging problem with preferences-extension support packagers trying to build all available storage drivers, but using preferences-extension support would end up with a bombed build if they included any drivers that didn't support preferences-extensions. this has been corrected so that each driver has a stub to the flat-file preferences code, which will be called if preferences extensions are disables or unsupported for that driver. in other words, it should be possible to build all drivers now with one build, even using preferences-extensions. Version 3.6.3 ------------- [20060117.0608] jonz: enhancements to accuracy, performance made several optimizations to enhance accuracy and performance: - rewrote some routines that were strdup'ing message body repeatedly - changes to tokenization and probability assignment make a noticeable difference in accuracy [20060113.0400] jonz: change for dspam_stats output updated dspam_stats "-S" output to use more widely accepted readings: SHR: Spam Hit Rate (true positive rate) HSR: Ham Strike Rate (false positive rate) OCA: Overall Classification Accuracy [20060111.0830] jonz: bugfix for commandline agent error fixed bug causing "no trusted delivery agent configured" error when calling dspam without an agent configured, but not delivering - or when using --classify [20060110.0830] jonz: bugfix for ChangeUserOnParse fixed minor bug causing ChangeUserOnParse to format incorrectly [20060110.0830] jonz: patch to support multiple users with logrotate applied patch by Norman Maurer to add large-scale support to dspam_logrotate [20051213.1200] jonz: memory leak in bayesian noise reduction corrected a memory leak generated when using bayesian noise reduction [20051201.0000] jonz: fix for ldap calls fix to close connections to ldap after calls fix to fail database creation on ldap failure Version 3.6.2 ------------- [20051124.0900] jonz: bugfixes for token value calculations two bugs in how token values are calculated caused a significant rise in false positives. this is now fixed, cutting false positives nearly in half. [20051123.1905] jonz: fix for get_nexttoken and hash_drv fixed calloc(0) oddity in get_nexttoken in hash_drv [20051123.0231] jonz: fix for hash_drv in daemon mode when hash_drv is used in daemon mode without HashConcurrentUser option, segfaults can occur due to a failure to initialize the locking subsystem. Version 3.6.1 ------------- [20051029.2230] jonz: fix for parsetoheaders fix for parsetoheaders which could have caused a segfault on malformatted "To" header [20051029.2230] jonz: qmail support for tracksources applied patch contributed by Doug Miller to add support for parsing qmail headers [20051025.0130] jonz: added check for strcasestr for some operating systems that have strcasestr, use the os's version instead of our own [20051026.0130] jonz: fix for x-dspam-reclassified heading added fix for x-dspam-reclassified heading, which appears blank after corrective training [20051025.0800] jonz: plused-detail to work with domains plused-detail would previously chop off anything after the +, which presented a problem for systems using a full email address as a username. this fix will cause user@domain.com to be used if user+mailbox@domain.com is specified, and plused-detail support is enabled. [20051025.0800] jonz: fixed 8-byte alignment for hash databases using hash_drv on 64-bit processors caused crashes due to the structures in the file not being 8-byte aligned. added cssconvert tool to convert all 3.6.0 databases to aligned format. [20051022.0945] jonz: added train.pl script added train.pl script to scripts/ as an example of how to properly train. updated markovian discrimination documents [20051020.0800] jonz: added processorBias preference added preference option to set processorBias [20051020.0800] jonz: fixed daemon-mode streaming issues fixed minor bugs causing trailing periods to be outputted after summaries causing streaming tools to break [20051020.0800] jonz: fixed document source bug fixed a bug causing all documents processed with "DataSource document" to fail [20051019.0300] jonz: fixed segfault on malformed Content-Type header fixed a segfault caused by invalid read on malformed Content-Type header [20051017.0800] jonz: fix for history in dspam.cgi fixed a typo causing the history to display blank in dspam.cgi Version 3.6.0 ------------- [20051016.1930] jonz: automatic whitelisting now trusted sender system Instead of senders having to send you zero spam in order to be whitelisted, they will have to send you less than one spam for every fifteen legitimate messages. [20051013.0820] jonz: fix for header truncation bug fixed condition where headers could be truncated if > 4k in length [20051002.2100] jonz: dynamic storage driver support for linux and freebsd added fix for linux and freebsd to work with dynamic storage drivers; added -rdynamic to LDFLAGS [20051001.1100] jonz: fixed dspam_merge tool fixed (rewrote) dspam_merge tool to work correctly. [20050930.0300] jonz: fixed spurious tabs in user log removed tabs from subject and sender in user log to avoid cgi malformatting [20050930.0300] jonz: added hash autoextend, csscompress tool added hash autoextend options to make hashes automatically grow. added csscompress tool to compress extents. made hash_drv default driver. [20050929.0700] jonz: added scan for dlopen/-ldl added a check to see if -ldl is needed to use dlopen/dlsym/etc [20050929.0700] jonz: fixed termination boundary bug fixed a bug causing a termination boundry to be written at end of html segment instead of on separate line, under certain conditions [20050928.2100] jonz: fixed bugs in _ds_pref_set dynamic functions bug causing segfault, was calling the driver inappropriately [20050928.2100] jonz: tokenizer tweak: don't ignore digits stopped ignoring digits to further improve accuracy (old, paranoid code) Version 3.6.rc3 --------------- [20050928.1800] jonz: removed token reassembly from ngram tokenizer removed token reassembly (individual letters, and chained letters appear to be 1% more accurate during training) [20050928.1800] jonz: added signature to deliver=summary added signature=[sig] when specifying --deliver=summary [20050925.2200] jonz: added infinite improbability drive added infinite improbability drive, ImprobabilityDrive on [20050924.2040] jonz: removed legacy algorithm switches from configure removed vintage (old) algorithm switches from autoconf, e.g. --enable-chi-square as they are now defined in dspam.conf. developers should now start using the CTX->algorithms context member and the DSA_ and DSP_ selections [20050924.2030] jonz: renamed SM/IM in dspam_stats renamed SM/IM in dspam_stats to TP (true positives) and TN (true negatives) [20050924.1345] jonz: relicensed; bound to GPL v2 after reading rms' latest responses hinting at the GPL3, which to some degree mandates feature continuity for commercial use and other limits on use rather than distribution, I thought I would take the opportunity while I have it to bind to the GPL v2. should the GPLv3 turn out to be sane, we can always open it back up to license under it. I have removed the "either version 2 or later versions" and replaced it with "version 2". ianal, but since no later versions existed during the period which dspam was available under old terms, it is my understanding that previous versions of dspam may similarly _not_ be applied to the GPLv3 as it would create a temporal paradox. even if this is incorrect, 3.6 will be released specifically under the GPLv2. [20050923.2100] jonz: renamed css_drv to hash_drv to avoid confusion, the CRM Sparse Spectra driver is now known as simply the hash driver, and is configured as hash_drv. [20050923.0800] jonz: added bias mode for markovian discrimination added support for ProcessorBias to markovian discrimination calculations. NOTE: there is a significant change in results, and you may wish to leave bias turned off (the way Bill intended it). See these sample results: Without Bias: sample TP: 956 TN: 992 FN: 49 FP: 13 SC: 0 IC: 0 With Bias: sample TP: 863 TN: 1004 FN: 140 FP: 3 SC: 8 IC: 0 [20050923.0800] jonz: added ProcessorBias and TestConditionalTraining added these options to dspam.conf, instead of using as configure options. be sure to add these to your existing dspam.conf to avoid changes in dspam's behavior (see UPGRADING). Version 3.6.rc2 --------------- [20050922.0400] jonz: added MySQLSupressQuote for MySQL 4.1 quoting bug documented MySQL quoting bug in some versions of 4.1 (see doc/mysql.txt), and added MySQLSupressQuote option to compensate, or alternatively you could just upgrade to a better version of MySQL [20050920.1800] jonz: added persistent mode support for css_drv NOTE: css_drv has since been renamed to hash_drv added daemon mode support for css_drv when using CSSConcurrentUser (permanently mmap's user's database into memory). no need for daemon mode when NOT using this feature so don't bother, but is very fast if you are using a single global css file. [20050916.0800] jonz: added override for css_rec_max to dspam.conf CSSRecMax can now be configured in dspam.conf [20050914.2215] jonz: added DataSource and ProcessorWordOccurence options if "DataSource document" is used in dspam.conf, all input will be treated as a message "body" (e.g. a document) rather than split up between headers and body. this is useful if classifying things other than email. if "ProcessorWordOccurence" is set to 'occurrence', all word counts are based on occurrence rather than per-message. this may be useful when classifying large documents. Version 3.6.rc1 --------------- [20050914.0700] jonz: applied history paging patch applied patch contributed by Norman Maurer to add paging to webui history [20050913.0700] jonz: added css tools in tools.css_drv: cssgrow - grow (or shrink) the capacity of a css file cssstat - report css file statistics (records free, used, max, etc) cssclean - free all tokens which have only been seen once (there is no date in css files, so only run this once every 30-90 days) [20050910.2145] jonz: css_drv using single .css file moved counters for both spam and nonspam into single .css file, saving considerable disk space by removing extra set of keys. also added header to css files containing record count, for cssgrow tool. NOTE: new format is incompatible with older development versions [20050910.1400] jonz: minor dspam_stats output changes SM and IM changed to FN and FP, respectively [20050910.1400] jonz: commandline enhancements added --class=nonspam which is the same as --class=innocent, depending on how users like to specify added --deliver=stdout which is a shortcut for --deliver=innocent,spam --stdout [20050910.1400] jonz: completed markovian discrimination Completed markovian discrimination algorithms and implemented 'naive' combination. Accuracy tests show significant improvement. [20050830.1500] jonz: RCPT TO and Broken Case bugfix fixed a bug where Broken Case wasn't working when in LMTP RCPT mode Version 3.6.beta.2 ------------------ [20050827.2130] jonz: dynamic storage driver library support support for dynamic storage driver library builds (including multiple driver builds for packagers) is now supported through the existing --with-storage-driver function. specifying a single storage driver, such as: --with-storage-driver=mysql_drv will build a statically linked storage driver (the same behavior as previous versions of dspam). specifying multiple drivers, however, will build dynamic libraries - one of which can then be dynamically loaded at run-time by setting the StorageDriver parameter in dspam.conf (see the new dspam.conf for more information). for example: --with-storage-driver=mysql_drv,pgsql_drv,ora_drv NOTE: required parameters for all activated drivers must be provided users wishing to build only one storage driver, but dynamically loaded instead of statically linked, may supply the same driver name twice to enable dynamic loading: --with-storage-driver=mysql_drv,mysql_drv [20050822.2350] jonz: added new X-DSPAM-Result / X-DSPAM-Reclassified values The following values may now appear under X-DSPAM-Result / X-DSPAM-Reclassified Spam, Innocent, Whitelisted, Blacklisted, Blocklisted, Virus This is made possible by the addition of CTX->class (char[32]), whose constants are made available in language.h. Previous CTX->result options specifying DSR_ISWHITELISTED and DSR_ISVIRUS values are now gone, leaving only DSR_ISINNOCENT and DSR_ISSPAM for backward-compatibility (whitelisted messages will be marked as DSR_ISINNOCENT). Future applications should use CTX->class instead of CTX->result for more specific results info. [20050822.1000] jonz: added optOutClamAV option to opt out of virus scanning User-configurable preference optOutClamAV added (set to 'on') to opt out of A/V, if active. [20050815.0800] jonz; added custom tags to dspam.conf for virtual uids added tags to dspam.conf to use custom table/field names for dspam_virtual_uids allowing you to use a postfix table or some other table for username/uid lookups. Version 3.5.3 ------------- [20050803.0700] jonz: changes to pgsql drive for 8.0+ changes to the pgsql driver have been made increasing performance by three times on versions 8.0+. the driver will auto-detect which version you are running in and take advantage of a new lookup_tokens function. IMPORTANT: If you're upgrading dspam on your pgsql 8.0+ system, you must create the lookup_tokens function added to the pgsql_objects.sql script or everything will break miserably. [20050801.0600] jonz: updated postfix documentation added model using postfix's content filter [20050801.0600] jonz: fixed bug in mysql driver fixed a bug causing a segfault in mysql_drv when certain preferences are null or have null values Version 3.5.2 ------------- [20050714.0715] jonz: added extern "C" for C++ compilers added extern "C" to libdspam.h for C++ compilers [20050714.0715] jonz: more work on css_drv more work on css_drv; functionality added to make dspam_stats work [20050714.0715] jonz: added WEB_ROOT to CGI added WEB_ROOT to configure.pl, location to webui's htdocs contents [20050714.0715] jonz: sqlite 3 purge script changes added a sqlite 3 specific purge script in tools.sqlite_drv [20050702.0030] jonz: cgi review and fixes reviewed cgi, made minor fixes to history display and admin graphs Version 3.5.1 ------------- [20050608.0330] jonz: fixed socket file descriptor leak on delivery failure fixed a file descriptor leak which occurs on socket delivery connect failure, leading to the daemon exiting when it has exceeded its file descriptor limit [20050607.2200] jonz: added preliminary ldap support added ldap verification function which will verify existence of user on an ldap server before adding to virtual uid table. dspam links to openldap libraries and is also compatible with os x's openldap libs. use --enable-ldap to enable. see dspam.conf for more information. STATUS: functionality is presently limited to lookup of dspam username only. user and domain as separate variables will come later. additional ldap mode to support actual uid lookups on ldap server will be added later as well. authentication also coming later [20050521.1600] jonz: smtp/lmtp delivery to accept all 2xx delivery responses DBmail appears to be the only MTA that sends a 215 response to a successful delivery, instead of 250. After reviewing RFC, the deliver_socket() code has been modified to accept any 2xx message in response to a final delivery instruction. [20050521.0730] jonz: improved performance of css_drv made additional performance improvements to css_drv. plan to finish up driver later this weekend. Version 3.5.0 ------------- [20050519.0600] jonz: made preferences case-insensitive user preferences now case insensitive [20050512.0400] jonz: added experimental css_drv (CRM114 sparse spectra driver) added *experimental* css_drv which is a flat-file based driver using Bill Yerazunis' "sparse spectra" approach to storage. each user will have two 16MB fixed-size files containing up to a million tokens (this will be adjustable later). [20050511.2045] jonz: added markovian weighting functions added functions to support markovian weighting (the algorithm used in crm114). markovian discrimination has shown to be very effective at classifying text, and with the new css_drv driver, performs very fast. note: dspam_dump and dspam_clean -p presently do not function with markovian discrimination, because weights are not stored in the database (they are computed when a message is processed). dspam_dump will inaccurately show a value of 0.5000 for all tokens. do not run dspam_clean -p at all when using markovian weighting, as the band of interestingness is entirely different and dspam_clean -p is likely to wipe out /all/ markov-based tokens. finally, see doc/markov.txt for more information on configuring dspam as a markovian filter [20050505.0800] jonz: added plused detail support applied patch submitted by Arnaldo Viegas de Lima to add plused-detail support (username+mailbox). [20050501.0800] jonz: LMTP error codes to include more descriptive text LMTP error codes should include relevant text from error where possible [20050501.0800] jonz: added preliminary support for Clam/AV added support for clamd virus checking via TCP via ClamAVHost and ClamAVPort properties (see dspam.conf. added ClamAVResponse to control how DSPAM should act if a virus is found. use --enable-clamav to enable [20050426.2300] jonz: added storeFragments preference when set to 'on', dspam will store 1k of the message body in user.frag/sig.frag [20050425.0700] jonz: changed copyright notice changed copyright notice to reflect my company name change (I am ditching Network Dweebs Corp. in exchange for Jonathan A. Zdziarski) [20050422.0700] jonz: got rid of report_error and report_error_printf all logging now calls LOG() which syslogs, prints to stderr, and debug LOGDEBUG remains updated language.h with more sensible error codes partitioned by service components. [20050419.0200] jonz: got rid of retrain.log entirely, reworked logs reworked the system and user logs to be a little more useful and removed the retrain.log entirely. it's pretty simple: 1 entry for a message process 1 entry for a retrain using same signature id 1 entry for a delivery error using same signature id the message id is also stored to detect resends. updated dspam.cgi. [20050417.1400] jonz: added support for domain-based delivery hosts you can now configure: DeliveryHost.domain.com a.b.c.da in dspam.conf [20050415.0700] jonz: added RBLInoculate option default is now changed to handle RBL spam like all other spam, but when setting RBLLearnAsSpam to "on", RBL spam will be inoculated [20050414.1900] jonz: added extended logging to system.log added recipient and beginnings of a "status" or remarks section for system.log. [20050413.2005] jonz: added PgSQLUIDInSignature option applied patch contributed by Alexandre Biancalana to mirror UID support for postgres [20050413.2005] jonz: added MySQLUIDInSignature option for mysql users who want to put the user id in the signature (and effectively have only one spam address for all users), this feature uses uid,signature instead of just signature. [20050412.0800] jonz: fix for cgi during time zone changes applied patch for cgi losing data during time zone changes [20050410.2000] jonz: adjusted mysql/pgsql objects/purge scripts for performance made adjustments based on performance without token counter indexes [02050409.1400] jonz: added 'date' to quarantine display, reversed ordering added 'date' field to quarantine display using X-DSPAM-Processed header. reversed 'Date' ordering to list most recent at top [20050405.1815] jonz: minor code cleanup minor cleanup of static -1 values in code, replaced with #define'd values removed configure help strings and checking notifications for developer-only features. will phase out in a few versions. [20050402.1945] jonz: changed domain-based groups to use *@domain domain-based groups have been changed to require *@domain.tld instead of @domain.tld in the group file. [20050401.0100] jonz: added domain blocklisting added domain blocklisting via data/user.blocklist file; does not train. useful for filtering unwanted nonspam. may expand for full support later via cgi. [20050331.0600] jonz: external default preferences merged external preferences (default preferences stored in the database or .pref file) were not previously used when merging system and user preferences. this now changes behavior so that dspam.conf preferences are only loaded in absence of default preferences set in the database. [20050331.0600] jonz: added fallback domain support fallback domain support is for systems using a user's full email address as a username, and allows users being processed who have no preferences to fallback to using @domain.tld as the username (which should be appropriately trained and set to notrain). this is only useful under limited circumstances, and would require that all valid system users be given at least one preference when provsioned. To use: 1. Add "FallbackDomains on" to dspam.conf 2. Create wildcard domains (@domain.tld) as users in dspam 3. Set each wildcard domain with the "fallbackDomain on" preference Version 3.4.9 ------------- [20050808.0000] jonz: upgraded streamlined blacklisting support to RABL changed configuration options for streamlined blacklisting to support the RABL [20050714.0700] jonz: add X-DSPAM-User when using localStore added X-DSPAM-User heading to spam when localStore preference is used, so that destination address will be right [20050714.0700] jonz: bugfix for header decoding fixed a bug where the decoded headers (used internally) were delivered, instead of the original headers Version 3.4.8 ------------- [20050612.2200] jonz: fixed null decoding bug fixed a bug where placing %00 or =00 in an encoded message would decode the NUL character, causing the message to become truncated. [20050611.1000] jonz: fixed broken managed group functions fixed many files (such as logs and corpus maker) which were not written to the correct directory when used with managed groups [20050610.0530] jonz: message truncation fix fixed a problem where some mta's woulud fail to quote a single dot on a line by itself, leading to message truncation. solution was to check for quoting inside dspam too. [20050606.0420] jonz: fixes for encoding shifts 7bit encoding tag changed to 8bit, due to some 8bit data in decoded messages [20050606.0420] jonz: fix for multiple-recipient decoding bug bug addresses siguation where messages addressed to multiple recipients may fail to be completely decoded Version 3.4.7 ------------- [20050522.2300] jonz: fix for managed groups reclassification fix for managed groups where delivery of false positives would fail due tothe managed group not being recognized [20050512.0600] jonz: fix for signature embedding with malformatted boundary fixed a bug where messages lacking a terminating boundary would fail to receive a signature in the message body Version 3.4.6 ------------- [20050503.2000] jonz: fixed obscure malformatted signature crash bug fixed an obscure bug causing dspam to crash under certain conditions when the loose signature was provided without the appropriate delimiter Version 3.4.5 ------------- [20050417.0900] jonz: fixed firstrun/firstspam messages with groups fixed condition where users in groups would never receive firstrun or firstspam messages. IMPORTANT: this fix will cause all users to receive the notifications for the first time. if you want to avoid this, touch user.firstrun and user.firstspam files in each user's data directory. [20050408.0300] jonz: set proper permissions for socket file set proper permissions (rwxrwxrwx) when creating socket file [20050419.2200] jonz: applied signature retrieval failure error patch applied patch to fix some false signature retrievals [20050415.2300] jonz: LMTP error codes for temporary failures changed relevant LMTP error codes to report temporary failures instead of permanent failures, to fix mail loss problem. for permanent failures, we still send a permanent failure [20050412.2130] jonz: fix for notspam- aliases and changeuseronparse corrected parsing error causing notspam- aliases to fail to identify username when using changeuseronparse Version 3.4.4 ------------- [20050412.0800] jonz: critical fix for signature-in-body fixed a critical bug that caused the signature to NOT be written to the message body on quoted-printable or base64 messages Version 3.4.3 ------------- [20050407.0800] jonz: fix for LMTP and QuarantineAgent bugfix for bug causing spam to be delivered via LMTP when a quarantine agent is specified [20050406.2300] jonz: fix for domain-scale and admin.cgi fix for admin.cgi when using domain-scale; could not previously find data dirs [20050406.0700] jonz: fixed cygwin build errors fixed cygwin build errors by removing support for ip blacklisting (cygwin does not have the necessary functions to perform these lookups, at least nothing standard + mt-safe) [20050406.0700] jonz: fixed optIn + preferences extension support fixed a bug where dspam_admin 'ch pref' would fail on nonexistent users, making it impossible to opt-in users when using preferences extension [20050404.0700] jonz: applied various cgi patches applied various cgi patches to fix: - showFactors preference not setting/showing correctly in admin.cgi - empty spamSubject raised errors when writing preferences - global default.prefs file should not be written with pref extensions [20050402.1000] jonz: changed log status of signature errors changed logging status of signature-based errors so they show up in the error and system logs, rather than just debug [20050402.0900] jonz: fixed system.log formatting bug fixed a bug causing the system.log to be malformatted when reclassifying a message Version 3.4.2 ------------- [20050328.0600] jonz: memory leak fix for chained token fixed a very small memory leak in chained tokens implementation [20050327.2330] jonz: updates to cgi added retrain log to keep track of messages retrained, which allows the cgi to display 'Retrained' in the history next to messages that have already been retrained. history retraining and quarantine retraining both log this info. history retraining now forwards false positives and removed from quarantine made top and bottom row of buttons symmetric in quarantine view (deliver checked, delete checked, delete all) [20050327.2330] jonz: no logs when using --classify no logs should be written when using --classify [20050327.2330] jonz: users with signatureLocation=headers should prefer header users using signatureLocation=headers should prefer the X-DSPAM-Signature header over any signature in the body (this can apparently cause problems when conversing with another dspam user) [20050327.1400] jonz: fix for history retraining fix for history based retraining that allows non-admins to use the feature [20050327.1400] jonz: applied pgsql patch applied pgsql patch to remmove duplicate key warnings and improve performance [20050326.1730] jonz: added support for non-ascii character sets added support for non-ascii character sets [20050325.2345] jonz: fixed avg. processing time reporting fixed avg. processing time reporting in daemon mode; was previously reporting time since daemon start [20050324.0730] jonz: added ChangeUserOnParse 'full' option setting ChangeUserOnParse to 'full' now uses the entire email address after the initial {spam,notspam}- identifier allowing the domain to be used. using 'on' or 'user' continues to default to just username. [20050324.0700] jonz: added long usernames support to dspam stats if --enable-long-usernames is used, formatting will add additional padding for username in stats output [20050322.2330] jonz: fixed deadlocked connection on null characters fixed a condition where a 300-second deadlock, then timeout occurs when client sends \0 to server [20050322.2330] jonz: fixed some minor memory leaks minor leaks in daemon process [20050320.0400] jonz: fixed pgsql_drv connection linger bug fixed a bug when using pgsql_drv and preferences extension that fails to close a pgsql database connection on exit, leaving connections lingering. Version 3.4.1 ------------- [20050320.1700] jonz: fix for exit codes with daemon+stdout fixed a bug causing Broken returnCodes to fail when using daemon in combination with --stdout. [20050318.1930] jonz: fixed transliteration of "" fixed transliteration of "" so that empty spaces can be passed to LDA [20050318.1900] jonz: implemented DSPAMPROCESSMODE service tag implemented DSPAMPROCESSMODE service tag for proprietary client functions [20050318.0700] jonz: added RSET functionality added RSET functionality to daemon [20050318.0700] jonz: added localStore preference for storage added localStore preference to set local storage username, which can override the DSPAM username (ideal for managing aliases) [20050318.0700] jonz: fixed bug related to corpusfeeding and signatures normally you're not supposed to corpus traini with email containing dspam signatures, but that doesn't mean dspam should bomb when you do. fixed a bug where the signature would cause the process to fail (since the signature isn't in the database anymore). [20050318.0700] jonz: fixed bug sending empty parameter via client fixed a bug where sending an empty parameter via client (for example -a "") would break the arg chain, as the empty argument would not be quoted [20050318.0700] jonz: fixed track sources on report spam bug fixed a bug causing tracksources to report a spam from the user who forwarded it to correct the filter [20050317.0500] jonz: fix for 5.1.0's in daemon mode fixed condition where 5.1.0 error would occur if TrustedDeliveryAgent was not specified, even if DeliveryHost was. fixed condition where 5.1.0 error would be followed by 354 (response to DATA) instead of failing [20050316.0700] jonz: fixed crash on long argv list fixed a crash/potential arg list overflow for authentication DSPAM clients sending too many arguments. [20050316.0100] jonz: reclassification failures should not deliver on failure false positive reclassification failures shouldn't deliver when a signature isn't found, as a spammer could take advantage of a system using false positive aliases and send mail through them. [20050316.0100] jonz: changed fp- to notspam- changed fp- to notspam- in parsetoheaders and docs [20050316.0030] jonz: added admin.cgi opt in/out preferences support added support for opt in/out to admin.cgi [20050315.0800] jonz: fixed broken returnCodes when using dspamc fixed broken return codes when using dspamc, which involved rewriting the return code reporting pieces of the agent to include a classification. when running in 'DSPAM' mode, : CLASSIFICATION is sent back with each response. [20050315.0800] jonz: applied patch to fix file descriptor problems when listener fails, file descriptor fails to close, causing a possibly fatal situation in the event of too many failures fixed invalid file descriptor problem [20050315.0730] jonz: added LOG_NOWAIT to syslogging added LOG_NOWAIT to syslogging, to prevent dspam from hanging if syslog is nonresponsive [20050313.0700] jonz: fixed bugs handling multiple rcpt to's fixed bugs in the handling of multiple rcpt to's. strange behavior; some email would get sent to the last recipient only. some email would never be processed for a user. fixed a bug where specifying --user without recipients could cause a crash. fixed it again. [20050313.0700] jonz: changed lmtp commandline options --lmtp-rcpt-to and --lmtp-recipient have been changed to --rcpt-to, which takes a list of users like --user does. For example: --user spot dick jane --rcpt-to spot@domain.com dick@domain.com jane@domain.com This allows for each user to be assigned its own unique recipient. If the recipient is not provided, it will default to the username being processed. --lmtp-mail-from has been changed to --mail-from [20050312.1600] jonz: added SMTP delivery support, changes delivery hosts added SMTP delivery support via DeliveryProto option in dspam.conf NOTE: configuration has changed for LMTP host delivery. See new dspam.conf * fixed crash bug related to an earlier committ [20050312.1530] jonz: added virtual_user_aliases.sql for mysql added virtual_user_aliases.sql, providing an alternative dspam_virtual_users table example where aliases to a single dspam user can be created. ideal for large mail systems with automatic provisioning of uids. this does, however, break dspam's own ability to create new virtual users, so they will need to be added manually if this aliases approach is used. [20050312.1035] jonz: added ChangeModeOnParse functionality added ChangeModeOnParse option to set the proper --class and --source when ParseToHeaders deteects a spam- or fp- address. This allows a system to be implemented without aliases, as all requests can be passed to dspam. [20050313.0700] jonz: added \r to socket communications as per RFC 2822, \r\n should be sent to SMTP/LMTP Version 3.4.0 ------------- [20050311.0800] jonz: moved supplementary documentation into doc/ moved all supplementary documentation into doc/ including storage driver, operating system, and mta configuration documentation. [20050311.0800] jonz: miscellaneous bug fixes many minor bug fixes Version 3.4.pr1 --------------- [20050308.0800] jonz: fixed header duplication on reclassify fixed a bug where a new set of headers would be added to the old set upon reclassification. now, only X-DSPAM-Reclassified header is added. also fixed duplicate signature being written. [20050305.0900] jonz: fixed bug in cgi autodetect of extensions fixed bug causing extensions detection to fail in cgi for users with older configure.pl scripts lacking AUTODETECT property. also propagated ability to "not" autodetect to admin.cgi [20050304.2100] jonz: fixed user duplicate bug in ParseToHeaders+Daemon combo fixed a bug causing a user to be processed twice when using ParseToHeaders and server daemon mode [20050302.0730] jonz: ClientHost to be used for client domain sockets If connecting to DLMTP client via domain socket, ClientHost is now used as the path to the socket on the client side, and ServerDomainSocketPath on the server side. [20050302.0730] jonz: minor lmtp protocol tweaks minor tweaks to LMTP protocol service [20050302.0730] jonz: fixed a bug with TOE and automatic whitelisting fixed a bug where the "spam hit" from a false positive wouldn't be reversed for users using TOE after their training threshold has been exceeded [20050302.0700] jonz: fixed unknown token value bug fixed a bug where unknown tokens would be assigned an innocent value rather than neutral Version 3.4.rc2a ---------------- [20050301.0700] jonz: fix for segfault on malformatted MAIL FROM fixed a bug causing a segfault when MAIL FROM was malformatted in DLMTP mode [20050301.0700] jonz: fix for pgsql double-innocent bug important bug fix for pgsql driver, where new innocent tokens were learned twice, killing accuracy since RC1 [20050301.0600] jonz: rewrote flat file preference code rewrote flat file preference code after discovering many bugs with patches submitted over past few months Version 3.4.rc2 --------------- [20050227.1515] jonz: LMTP enhancements 1. if --user is specified in ServerParameters when DSPAM is running in standard LMTP mode, then RCPT TO will be used only for delivery, and not processing. 2. added 'auto' mode for server, to auto-detect DLMTP or LMTP based on the LHLO ident 3. added extended status codes and initial LMTP extensions list [20050225.0700] jonz: added support for LMTP delivery via domai socket specifying a path as LMTPDeliveryHost will cause DSPAM to connect and deliver via domain socket (instead of TCP). [20050225.0600] jonz: added %r and %s LDA arguments For LMTP front-end with LDA delivery, the following conventions may be used in ServerParameters: %r - the RCPT TO provided via LMTP %s - the MAIL FROM provided via LMTP in both cases, only the content between < > is actually used [20050225.0500] jonz: added pass-through of mail from added pass-through of contents of MAIL FROM inside < >. for commandline functionality, also added --lmtp-mail-from to define it. [20050224.0700] jonz: added "standard" LMTP inbound support added a ServerMode option to dspam.conf allowing DSPAM to function in either "dspam" proprietary LMTP mode or "standard" LMTP server mode. see README section 2.4 for more information. support for multiple RCPT TO's added. Fixed pass-through of error messages, to show failure instead of "accepted" when a failure actually did occur. [20050224.0700] jonz: fixed dspamc hanging bug applied fix contributed by Peter Santiago to prevent dspamc from hanging [20050224.0700] jonz: added lmtp delivery support added support to deliver via LMTP instead of LDA. although this doesn't require operating in daemon mode, you must compile with --enable-daemon as the code uses some of the daemon's socket components. Version 3.4.rc1 --------------- [20050214.0800] jonz: miscellaneous minor patches miscellaneous minor patches to CGI and some small subroutines Version 3.4.beta.3 ------------------ [20050208.0330] jonz: added CGI spam functionality to history added a retrain link in the history portion of the CGI allowing users to retrain spam misses without having to forward messages. [20050205.1800] jonz: added fast user switching to CGI for admins added the ability for admins to change users dynamically in the cgi [20050204.0800] jonz: applied patch to improve performance in pgsql migration patched for performance improvement in pgsql migration tool [20050203.0800] jonz: signature to write to all text segments reverted back to signature writing to all text segments of an email, and not just the top level parts. [20050202.0530] jonz: miscellaneous cgi patches applied many miscellaneous cgi patches submitted by Ben Reser Version 3.4.beta.2 ------------------ [20050129.1945] jonz: applied cgi patch for delivery failure applied patch submitted by Martin Forssen to avoid losing mail in the event of a FP delivery failure within the CGI [20050129.0000] jonz: added doc for osx builds added doc for building dspam on osx [20050128.0245] jonz: fixed segfault on post-signature failure fixed a segfault that only occurs when dspam_process() fails after loading a signature [20050126.0100] jonz: fixed whitelist training bug with notrain when in notrain mode, whitelists were still trained [20050122.2115] jonz: applied bugfixes to sbl lookup fixed sbl lookups to use mt-safe functions, also fixed crashing [20050122.0900] jonz: applied latest postgresql patches applied latest patches upgrade postgresql drivers. significant change is the use of BIGINT to improve performance. while backward compatible, a migration tool has been provided (see UPGRADING). [20050119.0300] jonz: set default training buffer to off sedation should not be instantiated unless specifically specified [20050119.0245] jonz: applied patch adding aggregation to dspam_admin applied a patch by Philip Champon adding a preference aggregation function to dspam_admin, which combined system preferences with user preferences before outputting. [20050119.0230] jonz: fixed bug in statisticalSedation preference statisticalSedation preference has been ignored for 2 versions. this bug has been fixed. [20050118.0730] jonz: removed requirement for dspam_home in dspam_create() some storage drivers don't require dspam_home [20050118.0710] jonz: fixed negative values in web stats fixed a bug causing negative values to be written in web stats [20050118.0700] jonz: applied patch adding full preferences functionality applied a patch by Philip Champon adding full preferences functionality for installs without storage driver extensions. this allows preferences to be set through dspam_admin for flat file-based systems as well. [20050116.2330] jonz: tweak to statistical sedation added a tweak to statistical sedation so that only spammy tokens are sedated [20050116.2300] jonz: added makeCorpus preference added makeCorpus preference which records all messages processed for the user to dspam_home/data/username/username.corpus/[spam|nonspam]. when an error is corrected, the file is moved to the appropriate corpus. [20050116.1800] jonz: change to TUM training mode TUM training mode to train also when confidence is low [20050115.1200] jonz: fixed token malalignment bug fixed a bug causing token names to be malaligned with their values; should not have affected accuracy, only debug messages [20050115.1000] jonz: added dspam_logrotate added Steve Pellegrin's log rotate script as a tool [20050115.0900] jonz: fixed preference overrides fixed a bug causing preferences to fail to override Version 3.4.beta.1 ------------------ [20050113.0500] jonz: fixed sqlite shared group bug fixed a bug where sqlite was ignoring shared groups [20050112.0600] jonz: added stats functionality to dspam_stats added the following functionality to dspam_stats: - specifying -S will display accuracy levels as percentages - specifying -s will use totals "since last reset" (s = snapshot) - specifying -r will reset the totals in the snapshot to measure accuracy for each month, use dspam_stats -r [username] at the beginning of the month, and dspam_stats -s -S [username] at the end of the month. NOTE: dspam_stats -r uses the same .rstats file as the CGI, so resetting stats in either will affect the other. this is the desired behavior at the moment, rather than have two snapshots running around. [20050111.0700] jonz: fixed preferences to aggregate fixed preferences so that system prefs and user prefs are aggregated, instead of either-or. also fixed a bug causing AllowOverrides to not work in preferences extension [20050106.1800] jonz: added SBLQueue dspam.conf option removed hard-coded sbl queue path (/var/spool/sbl) and added to config an option to specify the path [20050106.0000] jonz: added manual override for CGI options for implementations running the cgi as an untrusted user (e.g. cpanel), the cgi needs to manually override autodetection of certain options like filesystem scale and preferences extension detection. added option for this to configure.pl in the cgi. [20050105.2035] jonz: trainingMode preference to be case insensitive made training mode preference (toe, tum, etc) to be case insensitive [20050102.1800] jonz: added --client flag for client operations added --client flag for client operations. failure to use --client causes dspam to operate in independent mode IMPORTANT: --client should be added to the client arguments on daemon-based setups [20050102.2100] jonz: rewrite hash-code into 'diction' and 'term' structures rewrote the lht code (long-hash-token) into a more specialized 'diction' structure, which has terms. a diction represents a subset of lexical data within the user's dataset (namely, the data representing a message). [20041230.0800] jonz: memory leak and thread-safe cleanup - fixed memory leak related to bayesian noise reduction calls - fixed use of inet_ntoa, which is not thread-safe, replaced with inet_ntoa_r - inet_ntoa_r should never be called if using domain sockets [20041228.2300] jonz: integrated libbnr integrated bayesian noise reduction in DSPAM with libbnr sources [20041225.1400] jonz: added support for trainPristine as user preference Set trainPristine to 'on' in a user's preference to enable for that user. If master configuration has TrainPristine on, this will override the preference option. [20041224.1400] jonz: implemented daemon SIGHUP reload upon receiving SIGHUP, daemon will: - stop listening for new requests - allow all threads to finish processing - terminate all connections to the database - reload dspam.conf - reestablish all connections to the database - start listening for new requests [20041223.2106] jonz: fixed segfault on unreadable dspam.conf fixed a bug causing a segfault when dspam.conf is unreadable [20041223.1504] jonz: made url and tag scans case insensitive url and tag scans now case-insensitive, so HTTP:// and HREF= should also be detected and tokenized [20041221.2230] jonz: rewrote signature embedding code rewrote signature embedding code - fixed malformatting bug with signed messages - made body and html tag searches case insensitive [20041221.0600] jonz: added streamlined blackhole list "learn as spam" support added functionality to automatically perform streamlined blackhole list lookups and learn as spam if blacklisted. for more information on the streamlined blackhole list, see http://www.nuclearelephant.com/projects/sbl/ [20041221.0600] jonz: fixed preferences extensions and storage profiles fixed preferences extensions so that they will work with storage profiles [20041217.0700] jonz: replaced btree sort with heap sort, major speedup replaced btree sort of tokens by delta with a heap window-size sort, major speedup in performance; a 200k text message down to 2.2s from 12s. [20041215.1845] jonz: added sqlite3_drv added sqlite3 storage driver, use --with-storage-driver=sqlite3_drv [20041213.0600] jonz: added build hooks for NodalCore(tm) C-Series Accelerator added build hooks for the NodalCore(tm) C-Series Accelerator Card, the hardware platform used for Accelerated DSPAM(R). actual adapter code remains proprietary, so build hooks are only useful if you've licensed the accelerator. [20041209.0800] jonz: added profile failover added failover support for storage profiles [20041203.0800] jonz: added patch to sort by subject or from in CGI applied patch adding hotlinks to subject/from fields in dspam.cgi for sorting. [20041202.2024] jonz: added dspamc thin client added dspamc thin client for those who would prefer something light than running DSPAM in client mode (linked in with libdspam and other unnecessary code). [20041130.0800] jonz: dspam daemonized LMTP server and client added --daemon functionality to put DSPAM into daemonized LMTP server mode; configure client in dspam.conf to talk to daemon, or speak LMTP without the client. authentication configurable in dspam.conf. implemented stateful database connections. --stdout and --classify supported. NOTE: Daemon mode is multithreaded, and therefore requires a multithreaded driver: mysql_drv or pgsql_drv. [20041127.1700] jonz: normalized process_message() normalized process_message() by breaking up code into many smaller subroutines [20041121.1530] jonz: implemented bayesian noise reduction 2.0 see http://bnr.nuclearelephant.com [20041101.0700] jonz: added MaxMessageSize option to dspam.conf added a max message size configuration option to specify a maximum message size to process [20041024.1700] jonz: moved sources to src/ moved all build sources to src/ for easier management [20041024.1545] jonz: repeat linestripping as necessary changed broken linestripping code to repeat as necessary, for some dos systems with multiple ^M's. [20041020.2345] jonz: enhanced TUM mode training greatly enhanced TUM-mode training by setting a dirty bit for each token in memory; instead of writing all tokens and then adding a conditional where clause for TUM (where total hits < 50), only tokens whose total hits are below 50 are included in the sql statement, the extra where clause is no longer necessary. this helps tum outperform teft in resources. [20041020.1400] jonz: normalized main() normalized main() by breaking up code into smaller subroutines Version 3.2.4 ------------- [20041229.1925] jonz: fix for broken boundary rfc added fix for intentionally broken mime boundaries [20041203.0800] jonz: performance fixes for pgsql_drv minor performance fixed for pgsql_drv that may have a big effect on some implementations [20041203.0800] jonz: applied patch to fix build fail when CFLAGS defined fixed a bug causing tools to fail to build when CFLAGS was specified [20041203.0745] jonz: fixed addition of spurious colons after delimiter fixed bug causing a colon to be added to lines after -- delimiters that were not actual boundary delimiters. this also caused certain encoded portions of the body to not be decoded, giving the appearance of equal signs (=) in message bodies. Version 3.2.3 ------------- [20041125.0925] jonz: rewrote boundary extraction in decode.c rewrote boundary extraction in decode.c to fix a bug where messages could get mangled if boundary was specified without quotes, but other tags used quotes [20041123.1830] jonz: fixed multipart blocks with no content-type fixed bug causing the DSPAM signature to NOT be written to multipart blocks without a content-type (broken RFC) [20041123.0800] jonz: fixed bug in _ds_get_spamrecord broken on mysql 4.1 changed token = '' to token in('') in _ds_get_spamrecord, to fix bug in mysql 4.1 with respect to numeric fields and quoted conditionals. in('') seems to work without problem. [20041117.0800] jonz: fixed critical bug in Bayesian Noise Reduction fixed a critical bug in Bayesian Noise Reduction causing the algorithm to never instantiate Version 3.2.2 ------------- [20041114.2245] jonz: fixed optOut preferences option fixed a bug causing optOut preference to be ignored [20041112.0800] jonz: fixed source address tracking bug w/TOE fixed a bug causing source address tracking to fail when TOE used [20041112.0800] jonz: fixed LocalMX bug fixed a bug causing LocalMX to be ignored in dspam.conf [20041110.0800] jonz: set permissions on dspam.conf to 640 permissions on dspam.conf were defaulted to 750, changed to 640 [20041101.0700] jonz: changed loose signature matching changed loose signature to X-DSPAM-Signature from the ever useless DSPAM: to allow the use of signature headers in forwarded attachments. [20041109.0800] jonz: fixed source address tracking fixed source address tracking by removing an old #ifdef that never got defined in 3.2. also changed 'ham' to 'nonspam' in dspam.conf. [20041109.0800] jonz: adjusted chi-square cutoff changed chi-square cutoff from 0.5000 to 0.5010 to avoid erroneous classifications when there is no data [20041108.0800] jonz: fixed multiline token bug fixed a bug where tokens on a multiline header would be ignored past the first line [20041103.0745] jonz: fixed segfault on signature scan fixed a bug causing segfault during scanning of some messages for a signature [20041103.0745] jonz: fixed signature encoding bug in sqlite_drv fixed a bug causing signature inserts to fail in sqlite. Version 3.2.1 ------------- [20041029.0800] jonz: added needed c/r at end of pgp messages added needed c/r at end of pgp messages [20041029.0800] jonz: fixed invalid read of free()'d memory fixed invalid read of free()'d memory caused when parsing multi-line header tokens [20041029.0800] jonz: fixed pragma bug in sqlite_drv fixed a bug in sqlite_drv causing pragma's in dspam.conf to be ignored [20041028.0700] jonz: support for mysql 4.1's ON DUPLICATE KEY added support for mysql 4.1's ON DUPLICATE KEY functionality, so that compiling with 4.1+ will perform a single insert query without causing duplicate key failures [20041025.0600] jonz: memory leaks in dspam_clean fixed minor memory leaks in dspam_clean [20041025.0600] jonz: sqlite fixes fixes to sqlite driver; started using sqlite_[encode|decode]_binary and fixed calls to sqlite_finalize causing segfaults. [20041024.2000] jonz: added patch for parsing signature from body added a patch to parse leading whitespace from signature keys found in messages with malformed signature lines [20041024.1845] jonz: added patch for pgsql and PQfreemem() added patch to search for PQfreemem() and use free() as an alternative [20041024.0650] jonz: fixed bug with mysql_drv and duplicate key entries fixed a bug caused by performing multiple inserts simultaneously on the database [20041023.0923] jonz: fixed memory malpractice in pgsql_drv fixed some bugs in pgsql_drv with memory mishandling [20041021.0730] jonz: fixed attachment for PGP signed messages fixed a bug in the dspam.txt attachment added to PGP signed messages, causing the attachment to have an invalid boundary delimiter [20041021.0700] jonz: put --with-delivery-agent back, minor config fixes put --with-delivery-agent flag back (formerly, would try and just autodetect) and made some fixes to escape comma's. [20041020.1400] jonz: changed default logdir to dspam_home/log default logdir has been changed to dspam_home/log; this prevents confusion around permissions on /var/log [20041020.1400] jonz: applied patch for man page install applied patch adding $(DESTDIR) to man page install [20041020.1200] jonz: applied patch for URL parsing bug applied a patch causing an invalid memory read when an email ends with http:// Version 3.2.0 ------------- [20041020.0100] jonz: fixed mysql performance bottleneck fixed mysql performance bottleneck with inserts by using multi-row inserts instead of hundreds of individual inserts [20041019.2315] jonz: changes to mysql 4.1 purge script changed IN() to left-join query for faster purging rewrote all fields as 'not null' [20041019.2200] jonz: made all rows not null in mysql made all rows in mysql scripts, conserves 1-2 bytes per row and speeds up just a hair [20041018.0800] jonz: added qmail/vpopmail instructions added qmail/vpopmail instructions contributed by John Peacock [20041018.0800] jonz: split up MTA configuration into multiple README files split up the MTA configuration section of the README into multiple files. [20041018.0730] jonz: fix for write of .stats files on notrain .stats files shouldn't be getting written when in notrain mode [20041018.0700] jonz: memory leak fixes for pgsql many memory leak fixes for postgresql driver [20041017.1725] jonz: added mysql4-initialization configure option added an option to disable mysql client library initialization and cleanup. this is only really useful if you're using libdspam with a third party application that requires this (e.g. the application accesses libmysqlclient itself, and therefore needs to manage startup and shutdown of the library). [20041016.1935] jonz: fixed massive number of memory leaks fixed a massive number of memory leaks in libdspam and the agent and incorrect memory management practices. [20041015.1700] jonz: fixed sedation deactivation fixed a bug causing statistical sedation to _not_ deactivate even when the training buffer level was set to 0. [20041015.0800] jonz: fixed dspam_admin segfaults on invalid syntax fixed bugs in dspam_admin causing a segfault (instead of print of usage information) when too few arguments for a function were specified [20041015.0800] jonz: fixed preferences extensions in admin.cgi fixed bugs in admin.cgi causing server errors when preferences extensions was used. [20041014.1130] jonz: bugfix for dspam_dump applied bugfix for dspam_dump to username correctly when commandline options are specified. Version 3.2.pr1 --------------- [20041014.0800] jonz: added WITHOUT OIDS to all pgsql tables turned off OIDS for all pgsql tables, speeding up table access significantly [20041014.0000] jonz: added DSPAM_BIN to path in CGI added DSPAM_BIN to the path in configure.pl; some CGIs weren't finding the DSPAM binaries [20041013.1800] jonz: added mysql 4.1 objects script, renamed .sql files did some minor renaming of .sql files. added a mysql 4.1 object script which uses bigint/unsigned instead of char(20) for tokens. put neural networking in its own file. [20041013.0900] jonz: added mysql/postgres purge scripts for TOE and TUM added purge-pe.sql for mysql/postgres with preferences extensions. additional logic skips certain purges for TOE and TUM-mode users. [20041013.0830] jonz: consolidated error messages in language.h consolidated error messages and other important output in language.h to centralize most commonly used output, and to make translation easier. [20041012.2330] jonz: dspam_clean fix for toe training mode added patch to dspam_clean to skip certain unused token operations for users with toe training mode, since their last_hit value is never updated. left token probability operation, as it will use the date the token was first hit which is still useful. [20041012.0300] jonz: removed !DSPAM tag from X-DSPAM-Signature header when using signatureLocation=headers, the !DSPAM tag is no longer written to the header, just the signature. backward-compatible. [20041012.0300] jonz: changed location of dspam_home and dspam.conf dspam.conf now defaulted to sysconfdir (default: /usr/local/etc) dspam home now defaulted to prefix/var/dspam (default /usr/local/var/dspam) can still override dspam home using --with-dspam-home [20041011.0300] jonz: added --signature= functionality for commandline signature correction where the admin would prefer to just specify the signature on the commandline, --signature=[signature] can be used. only the signature itself should be provided, and not the !DSPAM tag. [20041009.1830] jonz: bugfixes for inline decoding fixed a bug which caused a segfault on malformed inline encoding blocks added better support for inline encoding; now encodes all blocks in a header and not just the first block. [20041009.1345] jonz: fix for sqlite permissions added fix for sqlite permissions to create database as 0660 [20041009.1100] jonz: added storage profile support Implemented this from my blog: 5. Distributed database configurations. I'd like to add a commandline option or environment variable to set a storage "profile". This profile would refer to a MySQL or PgSQL server config in dspam.conf. For example: MySQLServer.Sun420R 10.0.0.5 MySQLPort.Sun420R 3306 ... MySQLServer.DECAlpha 10.0.0.6 MySQLPort.DECAlpha 3306 ... Providing --profile=DECAlpha on the commandline would cause DSPAM to use that particular storage profile. This is especially useful in distributed environments where a user might be mapped to a particular server. [20041008.0420] jonz: added new logo added new logo for cgi [20041008.0420] jonz: added index to dspam_signature_data for created_on dates added index and updated purge.sql to use for dspam_signature_data, which greatly improves purge speed. Version 3.2.rc2 --------------- [20041007.2300] jonz: made LARGE_SCALE and DOMAIN_SCALE autodetect in CGI made filesystem scaling auto-detect based on dspam --version [20041007.1950] jonz: added preferences verbose output added verbose debugging of preference attributes and values loaded [20041007.0400] jonz: added autogen support for freebsd added support for freebsd to autogen script, so freebsd users can build from cvs. [20041006.2245] jonz: merged group cgi bugfix fixed a bug in the cgi where merged groups would be added to the user totals when displayed under statistics. [20041006.0430] jonz: added algorithm and pvalue choice to dspam.conf added support for selecting the combination algorithm(s) and pvalue technique to dspam.conf. for third-party agent compatibility, configure options have remained active, but the agent will override these if it finds options in dspam.conf. [20041005.1930] jonz: added debug option to dspam.conf added DebugOpt option to specify which types of messages to route to debug [20041005.0400] jonz: syslogging of more error messages added the syslogging of more types of failures [20041005.0355] jonz: libdspam debug: all calculations only on verbose changed libdspam's debugging output so that all calculations are made only when verbose is active; this will allow users to run with standard debug enabled without using as many resources [20041005.0102] jonz: applied pgsql patches for performance/bugfixes applied pgsql patch submitted by Rustam Aliyev to improve performance by an estimated 30% and fix some minor issues. [20041003.0515] jonz: --version to return configure parms only when trusted --version should print configuration parameters only when running as a trusted user [20041002.1730] jonz: added return codes on quarantine failure the agent now returns a failure code if it was unable to quarantine an incoming spam. [20041002.1700] jonz: added unlearning functionality added unlearning functionality which can be triggered in one of two ways: 1. using --mode=unlearn on the commandline will unlearn the message passed in; useful for some cases of error and such. Use --source=error and set --class to the original classification that the message was LEARNED WITH (e.g. if it was originally classified as spam, set --class=spam to unlearn it as spam). 2. by setting OnFail to unlearn in dspam.conf, DSPAM will unlearn a message on delivery or quarantine failure. this will fix problems on some servers where the message is requeued, and then reprocessed. [20041002.1040] jonz: minor bugfix for inoculation fixed a minor bug which may have caused message inoculations to be overly inoculated (5 hits instead of 2). [20041001.2145] jonz: algorithm cleanup cleaned up algorithm definitions in configure: 1. --disable-traditional-bayesian is now --disable-graham-bayesian 2. --disable-alternative-bayesian is now --disable-burton-bayesian 3. configure will no longer allow you to enable chi-square without disabling both bayesian calculations; this is to avoid rainstorms of false positives and accidental configurations by users who don't realize you need to disable one to enable the other Version 3.2.rc1 --------------- [20041001.0035] jonz: signatureLocation "headers" preserves encoding when signatureLocation is set to "headers", the message is treated as if signed and the original encoded body is preserved. [20041001.0030] jonz: a few features removed/changed a few features have been removed from the agent and/or libdspam to improve functionality and/or to restore libdspam's function as a text classifier and not an encoding/decoding engine. some functionality has simply been "moved" into the agent and out of libdspam. changes are very minor and shouldn't affect a majority of third party applications or many end-users. 1. dropped "attachment" signatureLocation signatureLocation = 'attachment' has been officially dropped. I realize one or two people were using it, but the amount of black magic that had to be used to maintain this function were just too time consuming, and nobody liked having paper clips on every message. 2. dropped DSF_COPYBACK; libdspam no longer permanently decodes anything copyback feature to copy back decoded message no longer used by DSPAM agent, not very useful for any other applications. applications looking to decode should consider either self-actualizing the message (as the dspam agent does) or using a different approach to decoding. 3. libdspam to treat all messages as "signed" libdspam now preserves the original message body and transfer-encoding, treating all messages as signed. the agent is the piece responsible now for modifying the message and appending signatures. [20040930.2215] jonz: added parse-to-headers to dspam.conf setting "ParseToHeaders on" in dspam.conf is now used to parse the To: line for extracting a username when forwarding spam/fp's to catchall domains (see README) [20040930.1900] jonz: changes to homedir option 1. --enable-homedir-dotfiles is now --enable-homedir 2. all .nodspam and .dspam files are now .optout and .optin, respectively 3. when --enable-homedir is used at configure time, not only are opt-in/out files looked for in the user's home directory, but all data files are stored in ~/.dspam including the .inoc file previously stored in ~. NOTE: This option requires dspam to run as setuid root (automatically configured) and is incompatible with the DSPAM CGI (can't read mailboxes). If you require users to be able to opt themselves in/out and use the CGI, use the CGI's opt-in/out preference or configure a small tool to manage them from DSPAM Home. [20040930.1845] jonz: added TrackSources attribute moved source address tracking into dspam.conf [20040930.1830] jonz: added Opt attribute Opt can be set to in or out in dspam.conf to specify whether the system is opt-in or opt-out. [20040930.1815] jonz: added TrainPristine attribute TrainPristine replaces --enable-webmail and is used to put the DSPAM agent in a training mode where it assumes the original message is provided for retraining. This ceases the writing of any signatures, and is ideal for webmail or imap systems where the original message is preserved on the server and can be used to retrain. [20040930.0129] jonz: implemented thread-safe functionality implemented thread-safe functionality in libdspam and two storage drivers (mysql_drv and pgsql_drv). each thread will require its own context, however if you check out the libdspam man page, you'll see it's possible to set up multiple contexts with the same database handle. [20040929.0710] jonz: added libdspam man page added libdspam man page, API reference. symlinked to: dspam_init, dspam_create, dspam_attach, dspam_addattribute, dspam_process, dspam_getsource, dspam_destroy [20040929.0700] jonz: fixed bugs in preferences extension/dspam.cgi applied fixed submitted by Marty Pauley to fix bugs with dspam.cgi's handling of preferences extension calls [20040928.1900] jonz: added new API functions added new API functions to support the libdspam attribute API. dspam_init has remained in the code for backward compatibility with other applications, however a new set of create/attach functions have been added with the attribute API so that storage attributes (such as server information) can be set prior to connecting to storage. see the updated example.c's example 4 for a more thorough explanation. to retain backward-compatibility, contexts instantiated using dspam_init will revert to their legacy behavior of looking for a [driver].data file in the dspam home. dspam_init() has been slightly tweaked, however, to require the path to the dspam home as an argument. DSPAM_HOME can be passed in by legacy applications that already have it defined. This functionality also allows multithreaded applications (which must have a separate context) to share a single database handle. [20040928.0501] jonz: added IgnoreHeader option to dspam.conf added IgnoreHeader option to dspam.conf, allowing specific headers from other virus tools/spam filters on the network to be ignored. [20040928.0210] jonz: made dspam.conf operational dspam.conf now operational; will copy at install time into prefix/etc if a copy does not already exist. NOTE: See the UPGRADING section of the README for a full explanation of changes and be sure to read before attempting an upgrade [20040928.0200] jonz: moved show/hide factors as a preference showFactors is now a preference (and suitable as an option for each user or globally); set to "on" to enable factors in the message headers. added to cgi. [20040925.0817] jonz: fixed empty spamSubject bug fixed bug causing subject of spam to be truncated if spamSubject left blank [20040923.0300] jonz: rating sort default quarantine view made "rating sort" the default quarantine view in cgi. if a delete all fails, will revert to a chronological sort so users can see the last spam to be quarantined. Version 3.2.beta-1 ------------------ [20040922.0500] jonz: fixed toe/zero signature bug fixed a bug created in 3.1.2 causing toe-mode training to write zeros for signatures (causing toe users to cease all learning) [20040922.0400] jonz: fixed mysql buffer overrun bug fixed a bug in the mysql driver where escaping a large number of characters caused an unexploitable overrun. Version 3.1.2 ------------- [20040906.0939] jonz: implemented sparse binary polynomial hashing implemented Bill Yerazunis' sparse binary polynomial hashing (tokenizer method only). use --feature=sbph to generate an SBPH-based token set. [20040901.0800] jonz: bugfix for --debug fixed --debug so it doesn't get passed along to MTA [20040830.0800] jonz: pgsql fixes Applied Rustam Aliyev's patch to fix the following issued with pgsql_drv: - Added support for Preferences Extensions. - BUGFIX: 'length' field's type changed from 'smallint' to 'int' 'smallint' not enought for big signatures. - All values passed to columns with 'smallint' type now are quoted. This will enable casting and make indexes on these columns available. - Added new index on dspam_token_data (token) which helps speed up some operations. - Number of fixes to keep memory cleaner. [20040826.2100] jonz: fixed bugs in classify and inoculation fixed a bug where noise reduction and chained tokens weren't applied to user classification and message inoculation [20040826.0600] jonz: tweaked mysql where clauses tweaked mysql where clauses for better indexing [20040825.0225] jonz: added --disable-factors option added option to disable factors in message headers Version 3.1.1 ------------- [20040819.0800] jonz: minor CGI template changes minor changes to CGI templates [20040819.0800] jonz: added X-DSPAM-Factors added determining factors header to emails containing a list of tokens that played a role in the decision. if multiple algorithms are defined, only one is used. if the message is spam, the factor set from an algorithm returning a spam result will be used. [20040818.1900] jonz: cast smallints in postgres cast all smallint's in postgres, so indexes should be used now (major performance increase) [20040818.1845] jonz: fixed memory leaks fixed some miscellaneous memory leaks [20040818.1845] jonz: added optIn / optOut preference added optIn and optOut preference support; whichever one is used depends on whether dspam is configured for opt-in or opt-out. [20040811.0900] jonz: fixed totals bug with merged groups fixed a small bug preventing totals from traveling < 0 when using merged groups Version 3.1.0 ------------- [20040724.2000] jonz: fixes to Bayesian Noise Reduction made fixed to Bayesian Noise Reduction to fix bugs related to 3.1 Beta [20040723.0630] jonz: added --deliver=summary option added 'summary' delivery option which will deliver (to stdout) a summary identical to the output of message classification: X-DSPAM-Result: user; result="Innocent"; probability=0.0023; confidence=1.00 Obviously, should not be used with --stdout Version 3.1.0-beta-2 -------------------- [20040721.0800] jonz: added single spam hit purge to purge.sql added purge of tokens with single spam hit to purge.sql. adjusted purge times [20040721.0800] jonz: place signature before /HTML tags in spams without a tag, signature are now placed before /html tags in order to ensure they are passed on with some email clients (such as outlook). this might explain users who receive the same spam over and over. [20040720.2130] jonz: rewrites to bayesian noise reduction rewrite of BNR algorithm with minor tweaks, code cleanup [20040720.0800] jonz: applied patches to CGI applied patches to CGI submitted by Craig Hockenberry to add configure.pl functionality for configuring the CGIs. [20040716.0800] jonz: removed 2500 message threshold for TOE TOE-mode training now kicks in immediately after 100 learned innocent messages, rather than waiting for 2500 messages. as a result, more initial errors are likely to occur (just as with any other filter implementing TOE) but final accuracy should be better. [20040711.2300] jonz: fixed field names in dspam_2sql updated field names in dspam_2sql to reflect present-day database field names. Version 3.1.0.beta.1.1 ---------------------- [20040711.2200] jonz: fixed --disable-trusted-user-security compile errors fixed compiler errors when users disabled trusted user security [20040711.2200] jonz removed debug output removed a line of debug output causing problems with implementations using stdout Version 3.1.0.beta.1 -------------------- [20040709.0700] jonz: fixed bug with subject encoding and spam tags fixed a bug where spam tags would not be added to encoded subjects [20040709.0700] jonz: added --debug commandline argument if --enable-debug is specified, --debug can be passed on the commandline to activate debugging. alternatively, dropping a .debug file in DSPAM_HOME or user.debug file in the user's DSPAM_HOME data directory will still work. [20040709.0700] jonz: fixed error bug with snprintf fixed a bug in error reporting where not using vsnprintf as required caused crashing on some systems. [20040709.0700] jonz: added header support for automatic whitelisting instead of X-DSPAM-Probability: -2 to identify automatic whitelisted emails, a header of X-DSPAM-Result: Whitelisted will be used, and the original probability (even if guilty) will be provided in each message. [20040707.0730] jonz: added dynamic noise reduction extension support added support for dynamic noise reduction extension; designed to track SNR in emails for each user to dynamically determine noise thresholds and perform calibration. extensions supported in libdspam, but is still experimental and only used for tracking noise margins at the moment. [20040707.0700] jonz: added whitelistThreshold preference the whitelistThreshold preference will set the threshold for innocent hits before automatically whitelisting a recipient. the default value is 10. do not set this value too low! [20040707.0500] jonz: added NOTRAIN preference for trainingMode added NOTRAIN preference for trainingMode, which will result in messages being processed but not trained. [20040707.0408] jonz: signature location now a preference signature location (headers, message, attachment) now moved to signatureLocation preference and added to CGI. configure-time arguments will set a default preference if user hasn't overridden. [20040706.2000] jonz: applied win32 patches applied patch portion of win32 build supplement; win32/README updated. visual c++ project updated. initial testing shows all systems go =) [20040706.0800] jonz: added ignoreGroups preference ignoreGroups, when set to 'on', will ignore any group memberships the user should belong to (including system-wide). useful to allow some users to remove themselves from any memberships. [20040705.2000] jonz: utilities to require trusted user permissions utilities modified to require the caller be a trusted user. this is normally done with groups, but as an extra security measure is also done with trusted users. [20040705.2000] jonz: rewrote preferences, added preferences extension support preference functions entirely rewritten. added preferences extension support to dspam, added first extension to mysql_drv, and added preference administration to dspam_admin. [20040705.0800] jonz: added sort option to cgi quarantine added ability to sort by rating or date to cgi's quarantine [20040630.0800] jonz: added preliminary win32 support files added Vadim Zeitlin's preliminary win32 files into win32/ directory [20040630.0800] jonz: added transactional blocks to postgres driver applied rustam's patch to add transactional blocks to pgsql_drv for performance increase [20040629.1945] jonz: untrusted user error to report username untrusted user error (specifying --user) should report active username [20040629.0800] jonz: fixed domain scale in dspam.cgi domain scale pathname was missing /data/ in dspam.cgi [20040629.0800] jonz: fixed segfault on empty body fixed a bug causing libdspam to segfault with some email having an empty body. [20040628.1945] jonz: added removal option for merged groups added removal option for merged groups, by specifying -username, username is removed from the group. This is useful if you want system-wide merged groups but have a few users who want to unsubscribe [20040628.0700] jonz: fixed bug in spam-subject fixed a bug in spam-subject causing: 1. the last character of the subject to be truncated 2. spam tags to be repeated for each local recipient [20040627.1330] jonz: added sql-formatted output support to dspam_dump added support for sql-formatted output in dspam_dump using the -d [driver] command. only driver supported is sqlite_drv. use dspam_2sql for all other drivers (dspam_dump dumps one user at a time, so is only useful for sqlite at the moment). [20040625.2300] jonz: rewrote locking in bdb drivers rewrote locking in bdb drivers to use fcntl locking instead of db env locking. kernel-level locking works over nfs and automatically removes stale locks if a process should crash or the system fail. [20040625.2200] jonz: fixed a locking bug with fcntllocking/quarantine fixed a quarantine locking bug where fcntl locking was not waiting for a lock, but returning a failure immediately if already locked [20040625.0800] jonz: added configure arguments to --version output added a list of arguments DSPAM was configured with to --version output [20040624.0425] jonz: applied CGI facelife applied CGI facelift submitted by Craig Hockenberry [20040623.0700] jonz: bugfix for encoded multiline header mangling fixed a bug that caused encoded, multiline headers to lose any lines of text after the first. [20040621.2135] jonz: made sqlite_drv default storage driver made sqlite_drv the default storage driver [20040621.2135] jonz: added SQLite storage driver added SQLite storage driver. see tools.sqlite_drv/README for more information [20040621.0245] jonz: committed minor patch for Solaris builds another patch for declaring u_int32_t's on Solaris [20040617.0220] jonz: fixed configure help text for --enable-webmail fixed configure help text for --enable-webmail, which was mangled [20040617.0211] jonz: fixed type-o in admin.cgi for $CONFIG{'LARGE_SCALE'} fixed a type-o in admin.cgi where $CONFIG{'LARGE_SCLAE'} = 0; Version 3.0.0 ------------- [20040614.0700] jonz: fixed 14-day user graphs fixed a bug causing the 14-day user graphs to appear empty [20040612.0018] jonz: oracle storage driver fixes made several bugfixes to oracle storage driver added --with-oracle-version[=10] configure flag for linking to 10g libraries [20040609.0205] jonz: fixed a bug in --enable-signature-attachments fixed two bugs using --enable-signature attachments; 1 compiler error and 1 segfault (uninitialized value) [20040608.0715] jonz: fixed compile bug with --enable-webmail fixed compile errors resulting from --enable-webmail [20040607.1800] jonz: replaced quarantine locking with fcntl locking replaced quarantine .lock'ing with fcntl locking and also applied it to locking .log files. fcntl should work over NFS. [20040607.0730] jonz: fixed rare segfault (strlen on NULL) fixed a rare segfault in decode.c [20040607.0730] jonz: minor aesthetic changes to cgi minor aesthetic changes to cgi [20040606.1445] jonz: added training left option to dspam_stats -H modified dspam_stats to display # of training messages left when using -H command [20040606.1441] jonz: fixed bug in training threshold fixed a bug in the training threshold, which miscalculated the mail left to train. [20040605.1521] jonz: added statistical sedation to cgi added level of sensitivity-during-training to cgi preferences [20040605.1450] jonz: added ability to edit user preferences from admin suite added the ability to edit user preferences (and the default preferences) from the admin suite. [20040605.1100] jonz: fixed a bug with user processing flag fixed a bug where some parameters may be added as users instead of parameters. this was particularly the case if no mailer flags prepended %u. [20040604.0525] jonz: fixed blank dspam signature on reclassification fixed a problem where reclassified messages would receive: X-DSPAM-Signature: !DSPAM! fixed this by NOT stripping the old X-DSPAM-Signature header, since a new one is not created upon reclassification [20040604.0525] jonz: fixed untrusted.mailer_args fixed a bug where the last argument of untrusted.mailer_args was ignored. Version 3.0.0.rc2 ----------------- [20040603.2215] jonz: added user-logging option added --disable-user-logging option to disable user logging [20040603.0500] jonz: auto-whitelisting now works with toe-mode training added code to cause automatic whitelisting to function with toe-mode training [20040602.0030] jonz: added administration suite cgi added administration suite cgi [20040602.0030] jonz: added system logging of execution time added system logging of execution time [20040602.0025] jonz: fixed spam subject fixed spam subject headings to support variable length titles [20040601.2230] jonz: added system logging added system logging to DSPAM_HOME/system.log for future sysadmin interface [20040601.1822] jonz: removed mysql delay_key_write removed mysql's delay_key_write feature from the sql scripts, because of a bug in mysql that leads to database corruption when using it. [20040601.0330] jonz: added To: header parsing added --enable-parse-to-header, which will parse spam-username and fp-username from the To: header of a message to determine the username. This can be used in lieu of using spam/fp aliases by creating a wildcard subdomain (such as spam.yourdomain.com) and piping all email into dspam without a --user flag, for example: wildcard: "|/usr/local/bin/dspam --mode=toe --class=spam --source=error" [20040531.2245] jonz: added pkgconfig files added installation of pkgconfig files submitted by Ronald Hummelink [20040531.2120] jonz: added --enable-broken-return-codes added --enable-broken-return-codes configure option which causes DSPAM to return an exit code of 99 if the message being processed is believed to be spam, 0 if not, and any other code to suggest an error has occured. this is useful for some MTAs such as qmail. [20040531.2100] jonz: fixed error.h overwrite bug fixed a bug where libc's error.h would be overwritten if --prefix=/usr. DSPAM headers are now written to includedir/dspam. [20040531.1915] jonz: added man pages added man pages to distribution [20040531.0830] jonz: fixed header signature stripping signatures no longer stripped if --enable-signature-headers is used; to allow for re-re-training [20040531.0830] jonz: fixed cgi graphs falling below zero minor fix to cgi graphs preventing data points from falling below zero Version 3.0.0.rc1 ----------------- [20040528.0100] jonz: added logging support added support for message logging (enabled by default). logs all classification calls to $DSPAM_HOME/data/user/user.log. disable with --disable-logging. [20040527.2200] jonz: added new CGI added new CGI [20040527.0730] jonz: added support for profiling added support for profiling using gmon output. this allows developers to use profiling tools such as gprof to analyze the performance of the software. [20040527.0730] jonz: applied patch submitted by Mark Femal applied a patch submitted by Mark Femal which: 1. Includes select *.h files and incorporates them into the installation 2. Fixes some issues in compiling with Sun's Pro C compiler 3. Makes some minor changes to header files to avoid conflicts Version 3.0.0.beta.3.1 ---------------------- [20040525.0830] jonz: fixed compiler error on verbose debug fixed compiler errors when verbose debug was enabled Version 3.0.0.beta.3 -------------------- [20040524.2024] jonz: bugfix for null bodies applied bugfix causing a segfault when the message body of some parts was null. rare occurrence. [20040524.1903] jonz: implemented Robinson's technique for combining p-values added support for using Robinson's technique for combining p-values, as described at http://www.linuxjournal.com/article.php?sid=6467. This technique is presently used for chi-square calculations, but using --enable-robinson-pvalues will use this technique for *all* calculations in place of Graham's approach. Appears to provide slightly better results (on the order of 1 message per thousand). [20040524.0529] jonz: implemented *real* chi-square implement Fisher-Robinson's Inverse Chi-Square algorithm...the real stuff. use --enable-chi-square to use. [20040522.2350] jonz: renamed chi-square to robinson's naive bayesian renamed chi-square because it really isn't chi-square, but robinson's first algorithm for naive bayesian combination. use --enable-robinson to use. [20040520.0800] jonz: bugfix for attachments fixed a bug that caused message headers in attachment sections to be ignored Version 3.0.0.beta.2.1 ---------------------- [20040518.0630] jonz: bugfix: seg faults on rare occasions fixed a strlen(NULL) bug fixing an occasional segfault [20040514.1130] jonz: applied dspam_genaliases patch applied dspam_genaliases patch supplied by Scott Moorhouse which adds the following functionality: --exclude NAME Do not generate an alias for username / usernames. --excludeuid NUM Do not generate an alias for UID / UIDS. --minuid NUM Minimum UID for which to generate an alias. --maxuid NUM Maximum UID for which to generate an alias. It also uses setpwent/getpwent to get passwd information instead of /etc/passwd. This allows the tool to be used with any default system authentication. [20040514.0830] jonz: modified mode=notrain to ignore signature when setting mode=notrain, the signature is NOT stored, and not appended to an email. Version 3.0.0.beta.2 -------------------- [20040513.1845] jonz: updated configure.ac updated configure.ac to work with newer versions of autoconf (with warnings) [20040513.0157] jonz: segfault patch for sql drivers applied patch to prevent segfaults in mysql and pgsql drivers under certain conditions [20040512.0830] jonz: user directories moved to $DSPAM_HOME/data user directories have been moved to $DSPAM_HOME/data. it will be necessary to move all user directories into this folder when upgrading [20040512.0830] jonz: default $DSPAM_HOME changed default dspam home has been changed from /etc/mail/dspam to /var/dspam. use --with-dspam-home to change this. [20040512.0830] jonz: patch for sql drivers applied patch for mysql and pgsql drivers to prevent errors in sql due to lack of commas Version 3.0.0.beta.1.2 ---------------------- [20040504.1835] jonz: bugfix for signed message signature corrected a bug where the boundary for a signed message would be missing a carriage return. [20040504.0548] jonz: bugfix for token storage bug fixed a token storage bug, where some tokens would not be stored if they were preceeded by a token that was found in the database [20040503.0830] jonz: bugfix for corpus spam delivery fixed a bug where corpusfed messages would be delivered if a quarantine agent was specified at configure time. [20040501.1052] jonz: added spam-subject feature added a spam-subject feature which can be activated with --enable-spam-subject. when enabled, DSPAM will prepend [SPAM] to the subject headers of all messages suspected to be spam. Version 3.0.0.beta.1.1 ---------------------- [20040501.0630] jonz: fixed critical problems with pgsql_drv driver fixed a critical problem with the postgres storage driver to correct sql errors in processing Version 3.0.0.beta.1 -------------------- [20040430.0800] jonz: fix for sql driver subtractions implemented GREATEST(0, [Argument] ) functions for subtractions, which fixes a problem in which error corrections are not made to tokens where there are zero hits for the classification being subtracted from. should also definitively prevent negative values in hit totals. [20040430.0800] jonz: bugfix: corpus feeding invoked test-conditional training fixed a bug where corpus feeding would invoke test-conditional training. [20040430.0800] jonz: test-conditional training to subtract only once test-conditional training modified to subtract from misclassified corpus only once, and corpus feed for all other iterations [20040430.0800] jonz: fixed bug in sql-drivers/test-conditional training fixed a bug in the sql drivers where test condition training would make exponential changes instead of incremental. this was due to not resetting the control token on every call to _ds_getall_spamrecords. [20040430.0745] jonz: fixed bug in web stats fixed bug where merged group web stats wouldn't get written [20040430.0730] jonz: fixed bug in TOE totals fixed a bug where spam/innocent classified wasn't updated when TOE was used [20040427.0433] jonz: fixed bug in mysql and pgsql drivers fixed a bug in mysql and pgsql drivers where dspam_merge was functioning incorrectly, due to the token count on record insertion being set to 1 or 0, and not the actual token value. [20040427.0155] jonz: merged groups shouldn't merge with themselves corrected a situation where the actual user in a merged group could be merged with themselves, if they were the target user. [20040427.0119] jonz: applied bdb patch for solaris applied a patch to building on Solaris 9 with BDB drivers [20040425.0757] jonz: updated pgsql drivers applied pgsql_drv storage driver updates submitted by Rustam Aliyev Version 3.0.0.alpha.6 --------------------- [20040424.2235] jonz: fixed header tokenization fixed header tokenization from previous alpha; was suddenly leaving out heading from token names. [20040424.1427] jonz: added merged groups merged groups are similar to global groups, only instead of the global user being used in lieu of per-user statistics, the global user in a merged group is merged with the user's own training data. this allows immediate correction to take place and no training loop. NOTE: merged groups are storage driver dependent. presently they have only been implemented for the mysql driver. [20040422.1900] jonz: messages with empty bodies should still be processed fixed bug where messages with empty bodies failed into delivery [20040422.1829] jonz: added encoding strip patch added patch to fix the stripping of the content-transfer-encoding [20040421.1809] jonz: added training mode 'notrain' added training mode 'notrain' which will process the message, but not train any user data; this is ideal for implementations where a global dictionary is used, but the administrator doesn't want to accumulate training data for each user. [20040421.0310] jonz: fixed TOE-mode totals updating fixed bug where TOE-mode would update totals when it shouldn't Version 3.0.0.alpha.5 --------------------- [20040421.0100] jonz: fixed totaling problems with classification groups fixed totaling problems with global users and classification groups, where spams wouldn't get counted, and some innocents [20040421.0100] jonz: fix for dspam_stats fix for dspam_stats, identifying individual users [20040420.0734] jonz: fix for builds on Solaris w/BDB fixed compiler error when building on Solaris w/BDB drivers [20040419.0758] jonz: fix for X-DSPAM-Result header problem with TOE TOE resulted in the X-DSPAM-Result being send to stdout, which broke all implementations of TOE where --stdout was used. bug fixed. [20040419.0700] jonz: added support for multipart/encrypted messages added the same support for multipart/encrypted messags as is provided for multipart/signed [20040418.1840] jonz: changes to pgsql objects changes to pgsql objects to fix performance issues [20040417.1105] jonz: more global user tweaks if the global user thinks the message was innocent, but the user thinks it was spam, retrain the message as a false positive into the user's dictionary automatically, but don't update FP totals (internal function) [20040417.1050] jonz: implemented totals checking implemented totals checking to insure no totals travel below 0 [20040417.1045] jonz: don't retrain some classification catches patch added not to retrain some spams in a global user catch if the user's own dictionary already learned it as spam [20040417.1037] jonz: patch for non-user creation patch made to sql-based drivers to avoid creating virtual users in cases where a message isn't being directly processed (e.g. tools, error correction, etc.) [20040417.2006] jonz: added human-readable patch to dspam_stats added patch for human-readable format to dspam_stats, submitted by Alan Shields Version 3.0.0.alpha.4 --------------------- [20040416.0000] jonz: fix for global users to prevent FPs applied bugfix for global users code where false positives were getting generated because the user's dictionary wasn't completely ignored. [20040416.0000] jonz: applied dspam_corpus division by zero patch applied div by zero patch for dspam_corpus submitted by Nick Burnett [20040415.0010] jonz: added end-of-token truncated symbols added support for end-of-token symbols, such as exclamation point. slight boost in accuracy in testing. [20040414.0052] jonz: added abbreviated feature references the first two letters of a feature can be used alternatively instead of the whole feature name; for example --feature=ch,no,wh [20040411.0100] jonz: added X-DSPAM-Confidence header added X-DSPAM-Confidence header to all processed messages to identify the confidence level of the decision made. [20040410.0930] jonz: tum maturity level increased to 50 hits train-until-mature level increased from 25 hits to 50; doesn't appear to work well in classification groups. [20040409.0201] jonz: added support for domain scale added support for domain scale applying patches submitted by Patrick Tudor [20040409.0153] jonz: applied pgsql patches applied more pgsql patches [20040409.0129] jonz: fixed headers to preserve original encoding headers are now delivered with original encodings [20040407.2254] jonz: added mass false positive button to CGI added a button to reverse multipe false positives by clicking on checkboxes. [20040407.2248] jonz: fixed bug in classification groups fixed a bug in classification groups, where a "classify catch" would cause the DSPAM signature to be empty, and thus irreversible. [20040407.0255] jonz: tweaks to postgres m4 tweaks to postgres m4 to test headers and library on configure Version 3.0.0.alpha.3 --------------------- [20040406.0124] jonz: supress extra newline in message body corrected message reassembly behavior by supressing newline characters at the end of the message body. [20040405.0524] jonz: added postgresql driver to project added pgsql_drv (PostgreSQL) submitted by Rustam Aliyev to project, added to configure with its own set of configuration commands. see tools.pgsql/README for more information. Applied recent SQL fixes. [20040405.0330] jonz: virtual users should not be created on reclassification if a message is being submitted for reclassification, a virtual user should not be created, but fail instead - e.g. spam could be getting sent to the alias, and shouldn't create new uids. [20040405.0233] jonz: fixed SQL-driver hits-below-zero bug fixed a bug causing some tokens to drop below zero hits using the mysql driver. [20040405.0149] jonz: fixed BNR bug fixed a bug caused by Bayesian Noise Reduction which caused some messages never to get learned if the control token was filtered; or caused filtered tokens never to be learned. [20040403.1745] jonz: rewrite of libdspam API rewrite of libdspam's API. in short: - Operating modes DSM_ADDSPAM and DSM_FALSEPOSITIVE dropped - CTX->classification added: DSR_ISSPAM | DSR_ISINNOCENT | DSR_NONE - CTX->source added: DSS_ERROR | DSS_INOCULATION | DSS_CORPUS | DSS_NONE provides a much cleaner and less ambiguous interface [20040403.1215] jonz: removed signature deletion removed signature deletion from agent, so messages can be re-re-classified. also prevents mysql errors. [20040403.1125] jonz: added dotfile debugging support --enable-debug and --enable-verbose-debug flags now require a .debug file to be dropped in order to log debug messages, providing you with the ability to dynamically activate/deactivate debug messages for some or all users. A .debug file can either be dropped in DSPAM_HOME to activate debugging for all users, or a username.debug file can be dropped in DSPAM_HOME/userpath/ to activate debugging for a subset of users. [20040402.1839] jonz: added support for domain-name groups added support for groups based on domain name Version 3.0.0.alpha.2 --------------------- [20040402.0730] jonz: improved agent classification output agent classification output improved to include username, result, probability, and confidence level in MIME format for easy parsing [20040402.0730] jonz: added broken MTA support --enable-broken-mta You should enable this if your MTA is broken and passes messages into DSPAM with CTRL-M's (^M) in them. [20040402.0730] jonz: added training loop buffering feature Training loop buffering is the amount of statistical sedation performed to water down statistics and avoid false positives during the user's training loop. The training buffer sets the buffer sensitivity, and should be a number between 0 (no buffering whatsoever) to 10 (heavy buffering). The default is 5, half of what previous versions of DSPAM used. To avoid dulling down statistics at all during the training loop, set this to 0. The training buffer can be set using bf=N as a feature, where N is the level of buffering (0-10). For example: --feature=chained,noise,tb=10 Causes the buffer level to be set to 10, the highest level of safety, whereas --feature=chained,noise,tb=0 Removes all buffering constraints [20040402.0723] jonz: fixed bug in dspam_dump fixed a bug in dspam_dump causing unknown tokens to be displayed with uninitialized values [20040402.0720] jonz: fixed bug in agent for signature dropping when a signature can't be found, the message is dropped; unfortunately the agent forgot to shut down the dspam context which caused BDB to lock up. [20040402.0700] jonz: added switch for webmail The webmail switch is designed for systems where the original message remains server side and can therefore be presented in pristine format for retraining. --enable-webmail The webmail switch is designed for systems where the original message remains server side and can therefore be presented in pristine format for retraining. This option will cause DSPAM to cease all writing of signatures and DSPAM headers to the message, and deliver the message in as pristine format as possible. This mode REQUIRES that the original message in its pristine format (as of delivery) be presented for retraining, as in the case of webmail or other applications where the message is actually kept server-side during reading, and is preserved. DO NOT use this switch unless the original message can be presented for retraining with the ORIGINAL HEADERS and NO MODIFICATIONS. [20040401.2243] jonz: fix for signature headers applied patch to fix multipart boundary bug when signature-headers is enabled Version 3.0.0.alpha.1 --------------------- [20040401.1230] jonz: patches to corpus locking made patches for corpus locking, to help prevent corruption with BDB drivers. DSPAM agent now drops a .corpuslock file upon processing a corpus which in turn tells the drivers not to run automatic recovery. this should prevent corruption when an email comes in while you are corpus training with the BDB drivers. this was not an issue with the SQL-based drivers. [20040401.1230] jonz: deleted libdb4_purge, libdb3_purge libdb4_purge and libdb3_purge have been obsoleted by the new rewritten dspam_clean tool [20040401.0720] jonz: extended group line length to 10k extended length of a single group line to 10k, from 1k [20040401.0720] jonz: new dspam_clean functionality dspam_clean has been rewritten to support the following different clean operations: 1. Using the -s flag, dspam_clean will continue to perform stale signature purging. If an age is specified, for example -s14, the age defined as the default will be overridden. Specifying an age of 0 will delete all signatures for the users processed. 2. Using the -p flag, dspam_clean will delete all tokens from a user's database whose probability is between 0.35 and 0.65 (fairly neutral, useless tokens) that fall beyond the default age. If an age is specified, for example -p30, the age defined as the default will be overridden. It is a good idea to use this type of clean with an age of 0 on users after a lot of corpus training. 3. Using the -u flag, dspam_clean will delete all unused tokens from a user's database. There are four different types of unused tokens: - Tokens which have not been used for a long time - Tokens which have a total hit count below 5 - Tokens which have only one spam hit - Tokens which have only one innocent hit Ages may be overridden by specifying a format such as -u30,15,10,10 where each number represents the respective age. Specifying an age of zero will delete all unused tokens in the category. Optionally, usernames may be specified to override the default behavior of processing all users. Examples: Process all users on the system using all clean operations: dspam_clean -s -p15 -u90,30,15,15 Delete all of user 'dick' and 'jane's signatures dspam_clean -s0 dick jane Perform a post-corpus training clean on user 'spot' dspam_clean -p0 -u0,0,0,0 Perform nightly maintenance using all default values, for all users, with all options enabled: dspam_clean -p -u -s NOTE: You may wish to only run certain cleaning modes depending on the type of storage driver you are using. For example, the MySQL storage driver includes a purge.sql script which performs signature and unused operations, leaving only the probability operation as a useful operation. If you are using a SQL-based storage driver, it is strongly recommended that you use the maintenace scripts wherever possible. [20040401.0720] jonz: added _ds_delall_spamrecords and _ds_del_spamrecord added spamrecord deletion functionality to storage driver, increased version to 5:0:0 [20040331.2000] jonz: applied some memory leak patches applied some memory leak patches submitted by William Ahern [20040328.2200] jonz: renamed USERDIR to DSPAM_HOME all references to USERDIR are now known as DSPAM_HOME, including the --with-dspam-home configure flag, and mode settings. [20040328.2200] jonz: moved several features to commandline many features have been REMOVED from the configure script and into the commandline including chained tokens, bayesian noise reduction, automatic whitelisting, and training modes. please see the documentation for a complete list of commandline arguments. configure functions which have changed: --with-userdir-* changed all to dspam-home --with-local-delivery-agent changed to --with-delivery-agent --enable/disable-chained-tokens removed from configure --enable/disable-bnr removed from configure --enable/disable-whitelist removed from configure --enable/disable-toe removed from configure --enable/disable-tum removed from configure --enable/disable-spam-delivery removed from configure --enable-deliver-to-stdout removed from configure [20040328.1745] jonz: completely reworked commandline arguments please see documentation for new commandline arguments. [20040328.1745] jonz: removed free-pass of arguments by untrusted users removed ability to pass in arguments by untrusted users, when the file untrusted.mailer_args didn't exist [20040327.2230] jonz: CGI to allow logo-click to return changed CGI to allow a click on the DSPAM logo to return the user to the main page [20040327.2222] jonz: thresholds to include all totals thresholds changed to include all 3 totals: learned, classified, corpusfed [20040327.2221] jonz: test-conditional training threshold dropped test-conditional training threshold dropped to 1000 messages [20040326.0730] jonz: extended DAF flagset extended DAF flagset to four bytes [20040326.0730] jonz: temporarily removed blackbox framework archived and removed blackbox framework from cvs; not likely i'll be working on it any time soon [20040325.2129] jonz: extended context flags to u_int32_t extended context flags to 4 bytes, to add additional commandline features [20040325.2129] jonz: compatibility fixes for TOE compatibility fixes for TOE for web client and stats [20040325.1939] jonz: code cleanup commented headers, cleaned up code [20040325.1930] jonz: converted total_spam, total_innocent converted total_spam, total_innocent to spam_learned, innocent_learned, and added spam_classified, innocent_classified for stats use with TOE. NOTE: changes are required to SQL-based drivers for this version MySQL Example: alter table dspam_stats add spam_learned int; alter table dspam_stats add innocent_learned int; alter table dspam_stats add spam_classified int; alter table dspam_stats add innocent_classified int; update dspam_stats set spam_learned = total_spam; update dspam_stats set innocent_learned = total_innocent; update dspam_stats set spam_classified = 0; update dspam_stats set innocent_classified = 0; alter table dspam_stats drop column total_spam; alter table dspam_stats drop column total_innocent; alter table dspam_stats add spam_misclassified int; alter table dspam_stats add innocent_misclassified int; update dspam_stats set spam_misclassified = spam_misses; update dspam_stats set innocent_misclassified = false_positives; alter table dspam_stats drop column spam_misses; alter table dspam_stats drop column false_positives; [20040325.1930] jonz: addspam to fail on failed signature retrieval due to a lot of misconfigurations of dspam, addspam will now fail if a signature cannot be retrieved. this should help pinpoint problem installs and clients, and prevent poor accuracy. Version 2.11.1 -------------- [20040325.0757] jonz: added --help added --help commandline argument [20040325.0757] jonz: fixed division by zero bug in dspam.cgi small chance of division by zero bug fixed [20040325.0740] jonz: fixed toe fixed toe, which has been accidentally disabled in testing [20040325.0740] jonz: provided runtime arguments for training mode added run-time arguments --toe --tum --teft to specify training mode. the default is based on configure-time options. also added training_mode variable to dspam context, should not affect compatibility. Version 2.10.2 -------------- [20040319.2138] jonz: added shell quoting of special characters special characters are now quoted, instead of filtered, when calling the LDA. Version 2.11.0 / Version 2.10.2 ------------------------------- [20040319.1845] jonz: fixed bash special characters problem fixed special characters problem in bash by encapsulating all arguments in quotes [20040319.0730] jonz: added train-on-mature training option --enable-tum train-on-mature (TuM) is a hybrid of train-everything and train-on-error. all tokens are candidates for training as in train-everything, but only tokens whose total number of "hits" don't exceed 100 are trained. on error, all tokens are trained. this provides a good balance between the volatility of train-everything and the lack of behavioral learning in train-on-error. it also has the added benefit of not breaking the things that toe presently breaks in dspam (whitelists, stats, etc). [20040319.0700] jonz: fixed source address bug fixed a bug in source address tracking where messages were reported as innocent even if they were guilty, if the user had < 2500 messages in corpus [20040318.1932] jonz: fixed compile-time warning in dspam_tools.c fixed warning for uninitialized crc variable [20040318.0259] jonz: post-training features dropped to 2500 post-training features such as TOE and BNR have had their prerequisite ham count droped from 4000 to 2500. [20040318.0241] jonz: fixed up headers so developers only need libdspam.h fixed up header dependencies so developers only need include libdspam.h to use libdspam. [20040318.0124] jonz: added support for header-based signatures for implementations where a signature in the body is unacceptable, using --enable-signature-headers will place the signature in the header, and not in the body. IMPORTANT: This will -require- that the headers be forwarded with the message when being reported as spam. This usually requires bouncing the message, forwarding it as an attachment, or using a macro. The header will otherwise be lost with standard forwarding. [20040316.2315] jonz: added support for userlist termination userlist can now be terminated using -- Version 2.10.1 -------------- [20040314.0128] jonz: bugfix for segfaults in dspam.c segfaults can occur on some systems (predominantly Solaris) when mail is sent to multiple local recipients. bugfix required the header insert pointer to be reset. Version 2.10.0 -------------- [20040307.1828] jonz: new dspam_corpus tool by Gary Funck replaced old dspam_corpus tool with a better one contributed by Gary Funck [20040305.0320] jonz: added postfix documentation added documentation for postfix local delivery [20040305.0320] jonz: added support for domain filesystem structure use of --enable-domain-scale configures filesystem for domain-based support. when used, username@domain should be passed in as the userid and $USERDIR/domain/username/ will be used instead of $USERDIR/username or $USERDIR/u/us/username as done with large scale [20040303.2208] jonz: applied bugfix patch by dennis pedersen applied a bugfix to libdb3 and libdb4 fixing a bug that was presented in rc2 causing loop hangs. submitted by dennis pedersen [20040303.0243] jonz: added long username support by default, the username length uses the same limits as the operating system. if --enable-long-usernames is specified, however, the limit will be set to 256. Version 2.10-rc2 ---------------- [20040302.0007] jonz: implemented auto-whitelisting implemented auto-whitelisting using --enable-whitelist function. automatic whitelisting will automatically whitelist any full 'From' addresses (including the name) that have appeared in at least 10 innocent messages and zero spams. when a message is forwarded as a spam, any automatic whitelisting for that address is permanently deactivated. [20040301.2339] jonz: fixed purge.sql fixed some bugs in MySQL's purge.sql, optimized for speed thanks to another patch submitted by bob glamm. [20040229.1245] jonz: applied patch submitted by Sascha Blank applied patch submitted by Sascha Blank for dspam_dump to allow lookup of individual tokens. [20040228.1618] jonz: train-on-error to perform source address tracking train-on-error mode fixed to perform source address tracking [20040224.2008] jonz: fixed high cpu utilization on large messages fixed an iteration problem which caused high cpu utilization on large (2MB+) text messages [20040223.0350] jonz: fixed compile error in libdspam.c fixed compile error in libdspam.c when HAVE_ISO_VARARGS isn't defined Version 2.10-rc1 ---------------- [20040222.1606] jonz: added support for global groups global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box filtering" for all new users until they have built their own useful dictionaries. to create a global classification group, add something like this to $USERDIR/group: groupname:classification:*globaluser This will automatically add globaluser as a classification peer to all users. Any user who has less than 1000 innocent messages or 250 spam messages in their corpus, or whose filter is uncertain about a particular message will consult the global dictionary for an answer. global groups will need to be trained using corpus or other means, or by using the dspam_merge tool. the global user (in this case 'globaluser') is treated just as any other user on the system. [20040221.2155] jonz: format changes to dspam_dump dspam_dump formatting changes + display of token probability [20040220.1700] jonz: added quick fix for \r stripping in dspam_corpus added a quick fix to strip \r's in mailboxes when using dspam_corpus [20040220.1700] jonz: fixed segfault bug fixed a bug that caused DSPAM to segfault on empty MIME delimiters. This generally only occured with spams, as legitimate messages have RFC-compliant delimiters. [20040219.0150] jonz: added support for neural networking see README for more details [20040218.2300] jonz: added tweaking to BNR for small text samples added tweaking of thresholds to BNR for small text sampes < 3.5k [20040217.0724] jonz: fixed some miscellaneous compile warnings fixed some miscellaneous compile warnings. 2 for when trusted user security is disabled, 1 for dspam_2mysql.c:126 Version 2.10-beta-2 ------------------- [20040214.1632] jonz: added TOE support added TOE (Train on Error) support using the --enable-toe configure function. see the README file for more details. [20040213.1549] jonz: fixed X-DSPAM header duplication bug fixed a bug which caused X-DSPAM headers to be cumulatively appended when a single message addresses multiple local users. [20040214.1327] jonz: added --enable-client-compression configure flag added option --enable-client-compression to use compression option between data source and its clients (where available). presently only available with the mysql_drv storage driver. you should enable this if the data source is on a separate machine from the DSPAM agent(s), as it conserves bandwidth at the expense of a few CPU cycles. [20040214.1258] jonz: created speed and space optimized MySQL scripts created both speed and space optimized mysql_objects.sql scripts. [20040214.1235] jonz: added new stats to CGI added FP stats + overall accuracy to CGI [20040214.1235] jonz: added debug output for noise filtering added noise level, spammy tokens, and eliminations to debug output Version 2.10-beta-1 ------------------- [20040212.2208] jonz: added stale data purge / PURGE_ANY added stale data purge to libdb3 and libdb4 purge tools. based on PURGE_ANY, defined in config.h, any stale data is removed after six months. [20040212.2205] jonz: added DSF_NOISE flag added DSF_NOISE flag to libdspam interface for activating Bayesian Noise Reduction. [20040211.0158] jonz: disabled mysql_drv _ds_delete_signature disabled _ds_delete_signature in mysql_drv due to errors; added signature purge to purge.sql script. no longer necessary to run dspam_clean if using the mysql storage driver. [20040211.0155] jonz: mysql_drv get_one update check to insure there was at least one token to be loaded, otherwise do not perform query Version 2.9.6 ------------- [2004020