Thursday, July 17, 2008

Blame it on sendmail

A new day, a new challenge: recently automated emails from some webservers just stopped arriving. The webservers are (almost) identical configuration, load balanced, hosting the same sites. You should also know, mail destined for other domains was getting through just fine (new user registration confirmations), but mail going to our own domain was not getting through (support form emails, internal notifications).

The first place to look was maillog:

Jul 17 10:15:35 www3 sendmail[25808]: m6HHCt7S025798: to=, ctladdr= (500/404), delay=00:02:20, xdelay=00:02:20, mailer=esmtp, pri=120342, relay=mail.example.com. [10.0.0.22], dsn=4.0.0, stat=Deferred: Connection timed out with mail.example.com.

That seemed strange because mail.example.com is in the same LAN as www3 and is pingable. I could telnet to port 25 of mail.example.com and get a message through that way. After hitting some dead ends (and wrong ends) messing with sendmail configuration files the problem presented itself to me. It was a DNS problem. Now, I explicitly defined the mail server with its internal IP address in /etc/hosts, but it seems sendmail was ignoring /etc/hosts and consulting a name server and getting the (external) IP address of said mail server. The mail server is not reachable by its external IP to hosts in the internal network, hence the time outs. A little modification of /etc/resolv.conf to add an internal name server, and I don't know if this was absolutely necessary, but a restart of the sendmail service, and messages started flowing. Inboxes started exploding with queued mail from the last five days. I'm sure there is a way to configure sendmail to use /etc/hosts but frankly editing sendmail configuration files scares me so I'm leaving well enough alone.

No comments: