Ask questions often and repeatedly

To drive this point home: A good starting point to responsibly implement AI is to ask questions, especially around key decision points. Ask them early; ask them often.

Ask the same question over and over again. (Answers might change as the team learns.) Ask different people on the team to get a collection of answers. Sometimes, you may not have an answer to a question immediately. That’s ok. Plan to get the answer, and be able to explain it, as the project progresses.

As each project differs, the questions required to assess for responsibility and trustworthiness may be different. The questions outlined in this module are designed to guide teams that are building and implementing AI systems but are not official standards or policy. Rather, they are good questions to consider as your team progresses through the AI lifecycle to begin to embed responsible and trustworthy AI in the process. These questions are intended to foster discussions around broad ethics topics. Ask these questions often through the design-develop-deploy cycle and combined with testing to help reduce unintended consequences.

It’s too late to start asking about responsible and trustworthy AI implementation when you’re readying a complex system for production. Even if you are only one person playing with data to see if AI might be a possible solution, ask these questions. Some of the questions may not apply in the early stages of discovery. That’s ok. Continue to ask them as the project evolves and document answers to these questions to track your project’s progress. These answers will help identify when in the AI lifecycle these questions will become relevant and can inform future systems.

Here are some suggested questions that any team attempting to develop responsible and trustworthy AI needs to consider:

1. Focus on the root problem

Government research projects and pilots are all looking to improve the function of our government, be it via a better chatbot to help with customer service or to detect cybersecurity threats faster and more efficiently. Whatever their purpose, exploring the use of new technologies, such as AI must be done in a way that evaluates whether AI is actually the best-fit solution. Teams that are building models and systems need to clearly understand the problem to be solved, who is affected by this problem, and how AI may — or may not — be a solution.

Questions to consider include:

Why are you considering using an AI solution in the first place?
Is it the best option to solve this particular problem? Have you evaluated alternative solutions?
Will it actually solve the problem? What metrics are important to assess this hypothesis and how will you measure them?
Will it equally benefit all users or just disproportionately help some, possibly at the cost to others?

Like previously highlighted, creating a team environment where all stakeholders are educated and empowered to participate in evaluation of these types of questions is essential. For example, if the metrics don’t require assessment of accessibility of the chatbot tool, the right questions were not asked.

2. Be accountable to the users

AI systems cannot exist in isolation. The outcomes produced by these systems must be able to be justified to the users who interact with them. In the case of government use, users range from government employees to recipients of benefits. This may also mean the systems must be able to demonstrate how the answer is reached, which is also critical to identifying the cause of negative outcomes. This also means that a person, or a team, needs to own the decisions that go into creating the systems.

Question to consider include:

When something deviates from the intended output or behavior, who is responsible for noticing and correcting this?
Is someone responsible for making sure that every step is not just done, but done correctly?

The process starts with establishing clear roles and responsibilities for data and model management. At a minimum, an aberrant outcome can be linked to its training source. This is often significantly harder than you would think, especially in the case of deep learning. Nevertheless, it should be ongoing.

3. Define and avoid harm

Researchers, advocates, and technologists on AI teams have concerns about numerous types of harms caused by ill-designed AI systems. These include risks of injury (both physical or psychological), denial of consequential services such as opportunity loss or economic loss, infringement on human rights (such as loss of dignity, liberty, or privacy), environmental impact, and even possible erosion of social and democratic structures. In looking at these harms it is important to remember that bias can impact who suffers from any of these types AI harms.

Bias can enter an AI system in many ways. While, some of the most commonly discussed bias issues are about discriminatory opportunity loss, seen in employment, housing, healthcare, and many other fields, it’s important to remember bias occurs in many forms. For example, a biased AI system for say self-driving cars could cause increased rates of physical harm to people with disabilities requiring mobility aids simply because the model data for pedestrians mostly consists of able-bodied data subjects.

Though it may be impossible to completely eliminate all bias (and that may not even be the goal) an AI team must be able to evaluate what possible harms of their system could be and how bias might cause disparate negative impacts across different populations. To reduce this possibility, the team must evaluate for bias in datasets, the model, and the design choices throughout the product life cycle. It must also evaluate for bias in the outcomes the systems produce to ensure the output does not disproportionately affect certain users.

Questions to consider include:

What are the possible negative impacts of these systems? How do we measure this harm and what could we do to mitigate that impact?
What is the demographics of people involved in the domain that the AI system works within? Who are directly and indirectly impacted?
What data is required to ensure equitable outcomes across the universe of people affected?

4. Monitor the outcomes

Even after asking essential questions during system design and development, the team must rigorously monitor and evaluate AI systems. The team should create structured oversight mechanisms and policies, ideally developed throughout the design process and in place before implementation, to identify anything that is potentially causing a problem so they can intervene quickly.

Questions to consider include:

Are there regular management reviews of changes made to the input, throughput, or output of the developed system?
Are there clear roles and responsibilities for the management of the AI system?
Are there automated system checks for issues such as model drift, anomalous behavior, or other potential changes?
Are the systems auditable so that the drivers of incorrect or inequitable outcomes can be identified and fixed?
Does the AI system provide clear notice of its use to impacted people, including what relevant factors are important to any decisions or determinations? Is there a mechanism for impacted people to contest, correct, or appeal or even opt out of the use of an AI system?

Of course, oversight will not solve all potential problems that may arise with an AI system, but it does create a plan to watch for, and catch, some of the foreseeable issues before they become harmful.

AI Guide for Government

1. Focus on the root problem

2. Be accountable to the users

3. Define and avoid harm

4. Monitor the outcomes