
Push notifications are the lifeline of modern mobile apps, delivering real-time updates, alerts, and personalized experiences — even when the app isn’t running. But how do these tiny messages travel across the globe from your server to a user’s screen within milliseconds?If you’re a developer eager to master the internal mechanics of Apple Push Notification Service (APNs) and Firebase Cloud Messaging (FCM), this guide is for you. We’ll peel back the layers and show you exactly how push notifications work under the hood, from authentication to persistent connections.

Why Push Notifications Matter in Modern Apps
Push notifications aren’t just messages — they’re user engagement engines. Whether it’s a breaking news alert, a chat message, or a critical app update, push notifications keep users connected without draining device resources.
To truly optimize push delivery, you need to understand the entire pipeline. Let’s break it down.
How a Notification Travels: From Your Server to the User’s Screen
High-level push notification delivery workflow diagram
The 5 Critical Steps of Push Notification Delivery
- Backend Request Initiation
Your server composes a notification payload (JSON structure) and sends it to the appropriate push service API endpoint. This request includes authentication credentials and a device identifier (device token for APNs, registration token for FCM). - Authentication & Queuing
The push service validates credentials and acknowledges the request with an HTTP 200 status and message ID. The message is then queued for delivery. Important: A success response only indicates acceptance for delivery, not that the device has received it yet. - Network Routing
Both APNs and FCM maintain persistent, secure connections to devices. The service determines the target device and routes the notification through its network infrastructure. - Device Reception
The user’s device, silently listening on the persistent connection, receives the notification data. The operating system wakes the appropriate service to handle the message. - Notification Processing and Display
Finally, the notification is presented to the user or processed by the app, depending on the app’s current state and the notification’s content specifications.
Apple Push Notification Service (APNs)
Apple Push Notification Service (APNs) internal architecture and delivery pipeline
Authentication Mechanisms
APNs supports two authentication methods:
- Token-based authentication: Uses JWT signed with an APNs key (newer, recommended)
- Certificate-based authentication: Uses SSL certificates (legacy method)
Always use token-based authentication for better scalability and security.
HTTP/2 API Integration
APNs expose an HTTP/2 API endpoint for optimal performance. Your server opens a connection and sends HTTP/2 POST requests with the device token in the URL path and JSON payload in the body. Critical best practice: Keep connections open for multiple requests, as opening/closing connections for each message can be treated as a denial-of-service attempt.
Message Validation and Response Handling
Upon receiving requests, APNs immediately validates:
- Authentication credentials
- Message structure integrity
- Device token format
- Payload size (maximum 4 KB)
- Topic authorization
Common APNs error responses include:
- 400 Bad Request: Malformed payload JSON
- 403 Forbidden: Authentication issues
- 410 Gone: Invalid device token
- 413 Payload Too Large: Exceeded size limit
- 429 Too Many Requests: Rate limit exceeded
Persistent Connection Architecture
Each Apple device maintains a persistent encrypted connection to APNs when network connectivity is available. This connection remains idle most of the time, using minimal power, and activates only when APNs delivers notifications. This design eliminates the need for devices to poll for messages.
Message Coalescing and Priority Handling
APNs supports message coalescing using apns-collapse-id. Multiple notifications with the same collapse ID result in only the most recent message being displayed. APNs may drop expired messages or lower-priority notifications if devices remain unreachable.
Priority levels:
- High priority: Immediate delivery, user-visible alerts
- Low priority: Background updates, batched delivery for battery optimization
Firebase Cloud Messaging (FCM)
Firebase Cloud Messaging (FCM) internal architecture and delivery pipeline
Why FCM Rules Android Push Delivery
Firebase Cloud Messaging (FCM) is Google’s official push notification service for Android, and it dominates for several reasons:
- Native Integration with Google Play Services
FCM is deeply integrated into the Android ecosystem through Google Play Services, making it lightweight and battery-efficient. Developers don’t need to manage custom sockets or polling mechanisms because FCM handles persistent connections behind the scenes. - Single Persistent Connection for All Apps
Instead of each app opening its own socket (which would drain battery), Android devices maintain one shared connection to FCM servers. This single connection powers notifications for every app on the device, optimizing network usage and improving delivery speed. - Cross-Platform Power
While FCM is optimized for Android, it also supports iOS and web notifications by leveraging APNs under the hood for iOS devices. This means developers can use one API to target multiple platforms. - Advanced Features
FCM isn’t just for basic notifications — it supports:
- Topic Messaging: Send messages to user groups (e.g., all users subscribed to “sports”)
- Device Group Messaging: Send to all devices owned by a user
- Upstream Messaging: Allows client-to-server messages
Bottom Line: FCM’s tight integration, efficiency, and scalability make it the default choice for Android push notifications.
How FCM Handles Millions of Notifications Per Minute
Delivering millions of notifications in real-time sounds impossible, but Google’s cloud infrastructure makes it happen with these mechanisms:
- Fan-Out Architecture
When you send a notification to a topic or multiple devices, FCM doesn’t send one message at a time. Instead, it uses a fan-out system that clones the message and delivers it in parallel to thousands or millions of devices. - Load Balancing Across Global Data Centers
FCM operates on Google’s global network, ensuring low latency by routing notifications through the nearest data center to the device’s region. - Queued Delivery with Retry Logic
If a device is offline, FCM queues the message for up to 4 weeks, retrying when the device reconnects. This guarantees delivery reliability without overloading the network. - Message Prioritization
- High Priority → Immediate delivery (can wake up the device)
- Normal Priority → Batched delivery during natural wake cycles for battery optimization
- Horizontal Scalability
Google’s infrastructure can scale horizontally, meaning as the load increases, FCM spins up more servers automatically to handle the traffic.
Result: FCM can process millions of messages per minute with minimal latency and maximum reliability.
Managing Device States: Online, Doze, and Offline Scenarios
Android devices aren’t always active — they can be online, in Doze mode, or completely offline. FCM intelligently adapts delivery based on device state:
➔ Device Online & Active
- Both high and normal priority messages are delivered immediately.
- Great for chat apps, real-time alerts.
➔ Device in Doze Mode (Battery Optimization)
- Normal-priority messages are held in FCM’s queue until the device exits Doze mode.
- High-priority messages can wake the device — but Google throttles abuse to protect battery life.
Example: Messaging apps use high priority for urgent chat notifications.
➔ Device Offline
- Messages are stored in FCM servers for up to 4 weeks.
- When the device reconnects, FCM delivers them in order.
- Ideal for low-network scenarios where the user is offline for hours or days.
Takeaway: FCM’s state-aware delivery system ensures reliable messaging without compromising battery performance.
Technical Constraints & Performance
- APNs limit: 4 KB payload
- FCM limit: 4 KB (2 KB for topics)
- Rate limits:
➔ APNs — No hard limit, but throttles abuse
➔ FCM — ~600,000 messages/min per project
Best Practices for Reliable Delivery
✔ Keep payloads lean — under 4 KB
✔ Use high priority only when necessary
✔ Clean up invalid tokens after errors
✔ Implement retry logic with exponential backoff
✔ Monitor success rates for delivery issues
Conclusion: Master Push Notifications Like a Pro
Push notifications are not magic — they’re engineering masterpieces. Both APNs and FCM use persistent connections, smart queuing, and optimized delivery algorithms to balance real-time speed and battery efficiency.
By following best practices, optimizing payloads, and handling errors gracefully, you’ll build a notification system that’s fast, reliable, and user-friendly.