Imagine getting notified of an issue at the worst possible time, scrambling to fix it before it spirals out of control. For DevOps teams, this is daily routine – until now. Without a clear system, the process becomes overwhelming and affects efficiency. In November 2021, I began redesigning New Relic's entire incident management system, improving its logic and performance. The result was "Alerts and AI," a tool that simplifies incident management and acts as the operational hub for hundreds of thousands of DevOps users.
1. Improvement of noise reduction algorithms;
2. Changing of product logicConsolidation of product experience
with NR1 standards;
3. Whole platform rebranding: marketing positioning, branding, design system.
1. Increase user engagement. Provide value for current users with improving experience according to the industry trends.
2. Increase trafic usage;
3. Attract new big and middle size corporate clients;
4. Enter to B2C market.
I led the design process for a product with multiple user flows and complex functionality, ensuring each step was backed by data and research. I collaborated with three product managers and the R&D team—front-end, back-end developers, and architects—to implement the design solutions I created, making the product both intuitive for users and technically robust.
The solution had to reduce cognitive load while boosting problem-solving efficiency. It had to impact to user daily routine of Incident management with New Relic. One of design goals was to structure multiple features and complex configurations in a way that kept everything simple and intuitive throughout the user flow. flexible and compatible with other tools, no matter the size of the organization or its management style.
This tool is used by a range of roles, each with different goals:
- When responding to notifications during shifts, I need to quickly analyze and resolve issues, so I can minimize downtime and ensure the system remains operational without disrupting services.
- When setting up monitoring tools and alert systems, I need to configure effective alerts and monitoring workflows, so I can proactively address incidents and ensure the system runs smoothly with minimal disruptions.
- When monitoring system performance, I need to quickly detect issues and identify their root causes, so I can resolve problems efficiently and ensure optimal system functionality.
- When managing the observability process, I need to track my team’s response to incidents and identify areas for improvement, so I can ensure efficient problem-solving, minimize downtime, and optimize team performance over time.
It’s hard to see the full picture. I get bits and pieces of data, but nothing ties it all together in one view.”
I learned the incident management flow by interviewing developers and DevOps teams to understand their daily routines and the tools they use. Depending on the company's size, users had different responsibilities, workflows, and tools.
These insights were instrumental in shaping the right information architecture, particularly addressing one of the most frustrating aspects of issue resolution for DevOps:
1. Managing noise;
2. Quickly filtering relevant alerts;
3. Lack of root cause analysis.
I conducted several blocks of user interviews to understand their habits, personal goals, and motivations. Additionally, I used card sorting to clarify the role of each piece of data and how users intended to utilize it after receiving it. This research was essential in shaping the design and ensuring it met the actual needs of the users.
The goal of this feed is to organize issues, giving users a quick summary of problems and letting them analyze trends. Key parameters are organized in a table, with sorting and filtering options tailored to different user roles. For instance, team managers focus on improving monitoring efficiency, while on-call engineers use flexible filters to pinpoint specific issues.
The landing page of Issues & AI gives an overview of the problems of the services, applications and DBs, connected
to New Relic.
The filtering feature is built to help users find specific issues based on different parameters and configurations. Designed as a wizard with a list of suggestions, it allows users to easily build custom filter queries. This ensures that users can quickly pinpoint the exact issues they need to address, tailored to their specific requirements.
The issue page is the go-to source for all the key data needed to troubleshoot. It provides details like timing, notifications, related incidents, entities (services, apps, DBs), logs, dashboards, and more. The challenge was organizing all this information in a way that’s intuitive and not overwhelming. Research helped us map out user journeys and fine-tune the information architecture to fit their needs.
The Issue page offers critical insights to help users quickly understand the problem, identify the root cause, and take action to resolve it.
The New Relic native app plays a critical role in the user journey, especially during on-call shifts when quick, on-the-go troubleshooting is essential. To support this need, we streamlined the app's functionality to focus on data display only, excluding configuration options to reduce complexity. Within Alerts & AI, the issues feed and issue detail pages – featuring drill-down options and user actions for managing issues – are central to the experience. The app is available for both iOS and Android platforms, ensuring accessibility across devices.
The redesigned Issues feed and page quickly became central to the product experience, drawing significant user engagement from the outset. The new design led to a 30% increase in web users visiting the Issue page after receiving a notification, compared to the legacy version. Additionally, 10% of users engaged with action functionality, boosting overall interaction rates. Users spent an average of 15-30 seconds longer on the Issue page than on the previous Alerts Incidents page, with around 35% clicking on links to continue their investigations. The Issue page was also aligned with platform-wide UX patterns and integrated into the NR1 home page, leading to around 900 unique users visiting it within the first month post-launch. This surge in engagement highlights the effectiveness of the new design, setting a strong foundation for further enhancements.