Hardware Replacement Workflow

Step-by-step workflow guiding data center technicians through parts replacement

Situation: Data center technicians all over the world were using different processes to replace parts. Sometimes failed parts weren’t being properly disposed of, causing security issues. Site managers lacked insight to how these replacement tasks were being performed and how long they were taking.

Task: Design a workflow feature for an existing application that walks technicians through a step-by-step process to scan failed parts and their replacement parts, and that tells technicians how a part should be disposed of and tracks where it goes.

Result: This feature was extremely well-received and technicians have been providing lots of ideas for for improvements and additional data points to capture. The inventory management team has better data on how and where parts are being used. The design has proved flexible for creating new workflows in other applications, providing a recognizable and predictable pattern for our users.

Actions:

User research and testing
Wireframes
Interaction design
Prototypes
Light visual design

Beginning

This project was brought to our team by a former technician who was now working as a program manager to bring better tooling to our data center personnel all over the world. He’d recruited a team of experienced technicians to act as experts in helping develop requirements, providing feedback, and acting as beta testers. Once I understand requirements I usually start by creating task flows, but we ended working through them together over several Zoom sessions. After some initial back and forth, we landed on three distinct steps in the process - scanning spare parts, scanning failed parts, and properly disposing of failed parts.

First Screens

I began by exploring the relationship between spare and failed parts, as the system needed to track which spare part replaced which failed part. Would users want to scan them in pairs or scan one set of either all at once? I showed some different options to stakeholders and learned some important considerations.

The Program Manager wanted to implement a tracking feature so that when spare parts were removed from inventory, the technician would immediately scan them so the system could track who had possession of them and for how long. The technicians told me the inventory room was often far away from where repairs needed to be made, and they needed to ensure the spares were actually in stock before beginning the replacement. Therefore, I needed to keep the scanning of spares and faileds separate, and technicians would need to scan spares first.

Step 3 is to handle the failed parts. The process requires re-scanning the failed part and then scanning the destruction bin. I also had to account for possible scanner failures.

Dialing in Step Three

After separating the scanning of spare and failed parts into two steps, stakeholders started focusing in on the third step. Failed parts need to be disposed of, but not all can be disposed of in the same way. Some parts must be returned to the manufacturer for credit while others, like servers, must be destroyed by a specialty 3rd-party company because they contain potentially sensitive data. Yet others required no special handling at all. However, in all cases the system needs to tell the the user how to dispose of the failed parts. Each goes into a special bin depending on its category, and parts that are marked for destruction require scanning a specific bin for tracking purposes.

I tried some different ideas, including adding a fourth step for the additional bin scan. But since we can’t know at the start of the process that this step will be required, I had to find a more flexible way to include an additional process flow on the last screen. I re-used one of my earlier layouts and added a box to the left of the table that walks the user through the destruction bin scanning process. Why on the left? In user testing I discovered they didn’t always see it when I had located it on the right.

“Can you just add…”

“…a way to order parts to replace the faileds in order to keep inventory stocked?”

“…a way to look up inventory at the start of the process so I know if we even have the necessary parts in stock?”

“…a way to use ‘donor’ parts when we have no spares in inventory?”

“…a way to scan the server chassis before replacing the server to ensure techs are working on the right one?”

These are some of the requests we’ve been getting now that this workflow is in regular use. Some ideas, like ordering replacements, I determined would be better handled behind the scenes so the user didn’t even need to make this decision. Others, like scanning the server chassis before replacing a server and looking up inventory at the start of the process, were validated by the expert team and so I created ways to insert them into the process. As for using donor parts when no spares are available, that process is still being worked out because it should only be a last-resort option and not an action that’s easily available to take.

The first screen is the addition of the Locate Spares step. The second shows the addition of the Scan Server step. You can also see some additional data added in the header that helps techs ensure they’re repairing the correct asset.

Expanding the Concept

To solve one of the initial requirements for tracking how long it takes a technician to perform each step, we are looking at expanding this stepped workflow process into more areas. I’ve been exploring how to use existing runbooks created by the expert team to further direct each step of specific processes, and then how to break them up into measurable steps that higher-ups can use to identify problem points and improve training and performance.

I began by choosing a runbook with tasks that require using multiple tools and mapping out each one a technician would have to go to complete the workflow.

I then designed the first five screens as a proof of concept to show to higher-ups. Each time a user clicks “Next” the timer records the duration for that step and starts over for the next. This helps managers understand how long certain tasks were taking so they could optimize processes or provide additional training. I opted not to show the timer to users because they said they resented “gamification” type features.

Overall, the higher-ups love this workflow and will be adding it to upcoming roadmap planning sessions. It still needs more design and testing, but it’s an example of how “show, don’t tell” helps prove the value of design.

Conclusion

This stepped workflow process has proven successful and flexible and is currently being expanded into our other applications and for other processes. Working directly with my colleagues on features that help them do their jobs better is always extremely rewarding. Now that they see someone cares about making their lives easier, they don’t hesitate to reach out with, “Hey, can you design me a new feature?” or, “Can you make this easier to do?” Yes, you bet.