File Delivery Retry logic
Endpoint Operations
For all external endpoint types, Thru MFT performs the following operations on the endpoint as needed:
Connection Attempt
Get List of Files
Download File
Upload File
Delete File
Rename File
File Exists Check
Folder Exists Check
Error Types by Entity
Depending on the error, it may be an error related to the Endpoint, Flow Endpoint, or an individual File
Endpoints
Connection Interrupted / Timeout
Authentication Failure
Bucket (S3) /Container (Azure) - Doesn't exist
Unknown Host
No Permission / Misconfiguration
Flow Endpoints
Connection Interrupted / Timeout
Folder Not Found
No Permission / Misconfiguration
File
Connection Interrupted / Timeout
No Permission / Misconfiguration
Non Retry-able Errors
The following error types are NOT retried and will NOT be captured by the below retry policies
Authentication Failure
Unknown Host
Bucket/Container - Doesn't exist
(Get List of Files) Folder Not Found
No Permission / Misconfiguration
Retry-able Errors
Any errors not in the above list are retried automatically. Some examples include:
Connection Timeout
Connection Interrupted
Retry Policy(s)
We have two primary retry policies
1 for the ‘Connect’ operation
1 for all other operations
In general, both retries policies implement a exponential time delay between attempts. The time delay intentional includes some jitter in the time.
For the ‘Connect' retry policy. We retry for a maximum of 3 minutes before ultimately failing
e.g. Thru attempts to connect to an External S3 bucket. The operation times out after 100 seconds. We wait apx 2 seconds (2^1) before trying again. The operation fails again, after 100 seconds. At this point, the overall attempt has lasted for over 200 seconds, which is greater than 3 minutes, and we will terminate.
e.g.2 Thru attempts to connect to an External SFTP server. The operation fails due to unknown network reasons after 5 seconds. We wait 2 seconds, and try again. The operation fails again, and we wait again, this time for 4 seconds. This process repeats until the sum of all time spent (attempts + waits) exceeds 3 minutes.
For the ‘Other’ retry policy, we retry based on the number of attempts. If an error is retry-able, we will retry 10 times (up to of 11 attempts). The total wait time between all attempts will be approximately 15-17 minutes.
e.g. Thru begins to download a file from an External FTP server. During the transfer, the connection gets terminated. We will wait 2s before trying again. This process repeats if the failure persists, until the maximum ‘wait’ time between attempts exceeds 392 seconds.
Both attempts (attempts + wait) take more depending on the error. e.g. a 'Timeout' error typically takes longer to error, sometimes up to 60s
The user will be notified via Alert for all operations that exhaust retry attempts. In addition, if the error was for an individual file, the file will be marked as 'Error' state in the system and will not be retried further.
There are a few mechanisms in the system that can help manually retry operations:
FlowEndpoint schedule - Will retry Connection Attempt failures
'Run Endpoint Now' API call (source & target) - same result as FlowEndpoint schedule
'Replay' feature (individual file)