High Availability Configuration
Primary Node:
The node is assigned a unique Node ID.
Configure Thru Node Health Check port.
Secondary Node:
1.Deploy an identical node on the secondary machine, either by:
Installing it again using the same script as the primary, or copying the installation folder from the primary to the secondary.
The secondary node must have the same Node ID as the primary.
The secondary node can now also be started, with the latest release of Thru Node, nodes can detect other active instances of itself in the network. If a node detects an already active instance, it will refrain from accepting new tasks, ensuring no conflicts arise between multiple active nodes. .
Configure Thru Node Health Check port. (recommended to enable local monitoring, but not mandatory)
Heartbeat Monitoring:
Monitor the health of the installed nodes through:
b. The Thru cloud control plane.
Failover Mechanism:
In case the primary node becomes unresponsive (heartbeat down):
The secondary node will automatically take over as the active node.
The updated node runtime ensures that:
A node can detect other running nodes and adjust its behavior accordingly.
Nodes will not restart or re-initiate tasks for at least 3 minutes (by default) after the previous instance shuts down.
This delay is configurable via the
HeartbeatTtlSeconds
parameter in the node'sappsettings.json
file (default: 180 seconds).
Multiple identical nodes can run simultaneously.
When this happens:
One node will act as the primary, actively processing tasks.
Other nodes will remain operational but will not accept new tasks unless the primary becomes unavailable.
When a failover from the primary to the secondary file transfer agent occurs, partially transferred files may not be automatically resumed by the secondary agent in the current implementation. Some partially transferred files may still require resending following a failover.
Best Practices:
Regularly test failover and recovery processes to ensure smooth operation.
Maintain robust logging for monitoring and troubleshooting.
Ensure all active nodes are configured to detect and respond appropriately to each other.
This approach eliminates the need for a passive secondary node and enhances system resilience by allowing both nodes to operate actively while preventing task conflicts.