It would be really neat with a system to attempt SFTP service recovery for user projects if it goes down for any reason or because of a configuration change on storage/backup nodes.
Hey Charlie – This should already exist, so if it’s not working then it’s a bug. Can you describe the scenario in which this did not happen?
There has been many times when some things has changed, for example when our backup server was redeployed, SFTP containers needed to be manually rebuilt to get it working again.
Most likely we still have a few SFTP containers that still needs to be manually rebuilt for the affected region in this case.
I will try to find a couple cases so you can check it out.
And before this SFTP containers has stopped working here and there and needs a manual rebuild, i don’t remember specifically what was or might be the related event to the other times right here and now.
I found one, check Slack!
Following up on our slack conversation: My guess is that the agent on each node will need to be restarted after the backup server goes offline.
systemctl restart cs-agent
The reason for this is our agent handles both backups and firewall rules, and if the backup server is gone and the agent is no longer able to manage the backup volumes mounted from it, it could end up in a state that requires a restart. So if that’s not working, then firewall rules won’t get applied.
A restart resolves this because the firewall rules are stored in the AZ’s consul cluster, and as soon as the agent boots it will reload those rules and make the firewall match what state it expects.
With all of this being said, my preference would be that the backup server going away does not cause our agent to crash, instead it should print a pretty error message and wait for it to come back online. So i’ll make a note for my self to review this scenario and see what improvements we can make.
Ok, this is good to know.
And to follow up, nothing changed after restarting the agent on the nodes.
I know you will investigate this, and i do see that there isn’t even a SFTP container for this project so there might not be anything to apply firewall rules for.
Ok, if the sftp container is not being created then this is a different issue. We’ll take this over to slack and i’ll see what’s going on with your installation.