- Respond to and resolve production incidents and issues promptly to minimize downtime.
- Investigate the root cause of incidents and implement permanent fixes to prevent recurrence.
- Monitor system performance, logs, and other metrics to identify potential issues before they impact users.
- Set up and maintain monitoring tools to receive alerts for abnormal behavior or system failures.
- Work closely with development teams, quality assurance, and other IT teams to ensure smooth deployment of applications and updates.
- Collaborate with third-party vendors or support teams when necessary.