Highly Available Runtime Configuration with Primary & Secondary Nodes

Here is an explanation for supporting a highly available configuration of the Thru Node runtime:

Highly Available Configuration:

Primary Node:
- Deploy on primary machine (Windows or Linux)
- The node is assigned unique Node ID
- Configure Thru Node Health Check port.

Secondary Node:
- Deploy identical node on secondary machine (either via installing it again via the same script as the primary or copying the installation folder from primary to secondary.)
- Remember the secondary is required to have same Node ID as primary
- Set to passive state initially

Heartbeat Monitoring:
- Monitor primary node's heartbeat via:
  a. Local HealthCheck Port
  b. Cloud control plane

Failover Mechanism:
- Detect primary node failure (heartbeat down)
- Activate secondary node
- The latest release of Node introduces an new feature that enables a node to detect any existing running nodes and refrain from initiating itself.

It is important to note that the same node WILL NOT RUN within (default) 3 minutes after the previous one has turned off.

The value can be adjusted using the "HeartbeatTtlSeconds" setting in the node appsettings.json file, with a default of 180 seconds.

Identical nodes can operate simultaneously; however, one node will recognize the other as active and will be unable to accept new tasks.

Best Practices:
- Regular testing of failover process
- Implement proper logging for troubleshooting