In some environments Outlook is very slow to notify that there is new mail.
The flaw is across all email services that allow synchronization (IMAP protocol), whether email SaaS services or email server software. It also happens with Microsoft stacks, with Exchange on IMAP and, relevant because demonstrable, with Office 365 itself.
This article explains the Outlook flaw and presents some solutions to mitigate this flaw that is unique to Outlook with no chance to act on the server side to fix it.
The following video demonstrates the failure, with explanation of the causes, in the scenario where the stack is all Microsoft (Outlook connected to an Office 365 account).
To mitigate the problem we suggest the following alternatives
- The two solutions indicated in the video: check any other Outlook folder and return to Inbox forces the refresh (clicking the Send/Receive button does not help), as well as taking Outlook offline and then back online;
- Keep Outlook and complement it with an email-notifier. An application that sits in the windows tray and only watches for new mail;
- Disregard Outlook notifications and rely on cell phone notifications;
- A complete change of email application (we recommend Thunderbird from the Mozilla Foundation which is free for commercial use).
Technical description of the Outlook crash
At the root of the crash is the inability of Outlook to recover from a TCP push-mail connection break. When a push-mail connection is dropped, Outlook never recovers without intervention, and therefore does not receive notifications of new mail.
The description of the Outlook failure is long because we explain the failure from the ground up, serves as a reference, and is not essential for an Outlook user to act on the problem.
What is a TCP connection and how can it break?
The Internet is a switched packet network. The basic unit of information is a packet of data. Imagine an envelope, with a sender and a receiver and an indivisible set of data (anything in the order of tens of KB). The network guarantees that it tries to deliver a packet, and guarantees that if it delivers, it delivers to the right destination. It does not guarantee that a packet is actually delivered, nor does it guarantee order between sequential packets. It is possible for the sender to issue A,B,C and arrive at destination C,A,B, or for C,B to arrive.
One of the network layers above, TCP, uses the Internet Protocol (IP) network, to produce an ordered channel of data. It basically introduces two guarantees: packet ordering, and guaranteed delivery. To do this, it introduces a state at the sender and the receiver: a sequence number. The sender always knows the number of the last packet whose reception has been acknowledged, and the number of the last packet sent. The receiver knows the number of the last packet received. This is enough to order retries to guarantee delivery, and to sort the packets at the receiver. Put it all together, and we have a circuit similar to the old telephone circuits (but digital).
A TCP connection “drops” when one of the points loses state. For example, if a client changes networks and gets a new IP address, TCP connections drop, because the server doesn’t know anything about the “new” client that appears to it. For example, a cell phone transitioning from wi-fi to mobile network loses all TCP connections.
To complicate things further, because there are no IP addresses for all the devices connected to the network, an additional complexity was introduced that created more actors on the TCP connection: private networks and NATs (Network Address Translation). If you look at a PC on the internal network, you see that the IP is from a private range (192.168.*.* or 10.*.*.* typically). These IPs do not exist in nodes on the Internet. They have been reserved for local networks. Before they “talk” to the public Internet, they have to be translated to a public IP. Typically this task is done by the local network’s outgoing router, provided by the ISP, which has a public IP that will be shared by all the PCs on the local network. To do this task, the router has a status table that associates each TCP connection to an IP address within the internal network.
If a router runs out of memory for the status table (called a NAT table), it frees up memory by “forgetting” some TCP connections. These connections “drop”. Routers choose the links that have been idle the longest in an attempt to cause little disruption of service, but it is not guaranteed that the link is actually idle. Sender and receiver don’t know about this drop, until they try to communicate.
IMAP email connections are TCP connections. The way it works is basically this: The client opens the connection, updates itself on the server status (what new messages there are, what messages have been deleted, folder list, etc), and then goes into idle mode. In this mode it waits for a notification from the server that the state has changed (typically, that a new message has arrived). This TCP connection is open, and stopped with no communication. It is an obvious candidate for a router that wants to free up NAT table space. Therefore, it regularly drops.
The fact that an IMAP connection drops in an IDLE state is considered normal. There is no point in trying to work the network to prevent connections from dropping. It is fruitless work because you would be fighting against default behaviors that are difficult to change.
What happens when an IMAP IDLE connection drops is that the client will not receive the new mail notification when it happens. The server, when sending the notification, will get a response from the router that it has no open connection, and will close the connection (the error is usually “connection reset by peer”). The connection has to be reopened by the client, so the server can do nothing, and waits for the client to fix the connection drop. Because this is considered normal, client and server regularly (every 5min) send a NOOP. Basically, a command to say “I’m still here”, to test the connection. If the command fails, the protocol provides for the client to open a new connection.
This is where Outlook’s failure lies. When NOOP fails, Outlook does not open a new connection. It loses the connection it had to the server, and opens no more. It opens it only when a user action forces it to. The Send/Receive button is ineffective; all it does is list the special-use folders (Drafts, Trash, etc.), but not the Inbox. Outlook is “lazy” and assumes that it doesn’t need to check the Inbox because this is covered by the idle IMAP connection (which has since failed).
The two actions that have the side effect of causing Outlook to reopen the IMAP connection are:
- Open another folder and open Inbox again (e.g. click on Drafts and back into Inbox)
- Put Outlook into Offline mode and back into Online mode.