Data Policy and Ethics 2020: The Year in Review

After living through 2020 and experiencing the first weeks of 2021, It is no wonder that people across the globe are questioning the ethical standards of everyday organizations and the people in charge of them. Understanding the consequences of the actions of people and power and their followers, or lack thereof, feels daunting after a year of so many tragedies: the murders of countless black and brown people, the pandemic, one of the hottest years on record, political unrest and economic struggles. Looking back at some of the issues of the past year we’ve made great strides in the areas of data ethics, but the circumstances have also raised questions. Admittedly I, like most people, was occupied by the ever-changing state of our world, managing work from home situations, constant meeting interruptions, and trying to retain some ounce of normalcy. Keeping up with the latest news out of the data science world was not a top priority. Now that we are in a new year we can look back at some of the most impactful changes in data science and data policy. Changes that have implications that reach from marketing and sales to government, entertainment and every other facet of life well into the future. 

The Trump Administration Makes its Mark on AI Policy

The Trump administration has had a tumultuous relationship with ethics over the past four years and the first few days of 2021 they’ve only seemed to double down on this policy. The President’s involvement with the riot at the capital and his subsequent calls for action call into question the policies and ethics surrounding these decisions. Similarly throughout his presidency decisions have been made that call into question the ethics of this administration. Despite these actions, Trump made an interesting move in the world of data policy and ethics. At the end of 2020 the President signed an executive order to ensure that Agencies of the United States government use AI in an ethical and trustworthy way and remain competitive in the industry. Over the past year agencies of the US government have hired Chief Data Officers, began creating data standards and practices as well as creating data codes of ethics. The Department of Defence led the way with a comprehensive document on how the agency will ensure ethical standards around data integrity and data use.

While these comprehensive plans are exciting developments in the government, which has historically been slow to adapt technology, they are not compulsory or cohesive across agencies and the standards of ethics in this order are vague at best. Unfortunately President Trump’s executive order does little to remedy this, however, the order does create a registry of models deployed within the government, sets up a timeline for creating policy guidance, encourages agencies to hire tech-focused teams and individuals and encourages transparency in AI use throughout the government in areas not involved with R&D or national security. 

As with all outgoing administrations there is a chance that this order will get modified or thrown out completely, however the basis on which this executive order stands feels solid to me. As in most cases involving AI policy, my greatest concern is transparency in allowing people to understand when and how AI might be affecting their lives and giving watchdogs the ability to call out machine learning models and AI structures that could harm them. This executive order sets up a structure that begins to do just that.

The Facebook and Google Lawsuit Means a Reckoning for the Tech Industry... No Matter the Outcome

After the terrifying events at the capital in early January one of the fastest groups to act in condemning the actions of the president and his followers were the big tech companies. Banning Trump and blocking some of his content as well as blocking groups that helped organize the riots. One has to wonder how these actions as well as the half-baked past attempts to quell misinformation and hate groups will affect upcoming litigation against big tech. 

Several states and the US Federal government have levied a lawsuit against Facebook and Google for antitrust violations. While many people understand the lawsuit to be rooted in the traditional idea of a monopoly that causes an increase in prices, reduction in supply and thus harming consumers; the lawyers in this case are arguing that these companies are not harming consumers economically but have so much data power that they wield that they essentially make it impossible for competitors without the same terabytes of data on each "customer" to enter the market competitively. Additionally, these lawyers are arguing that access to these data stores through software called APIs could be giving the large players like Facebook and Google a bargaining chip they can use against smaller companies trying to enter the market. 

While this lawsuit could take several years to come to a conclusion I think this shows that there are some vulnerabilities in the big-tech data-as-a-service model. Since these companies rely on economies of scale (more users equals more data equals a better product for users equals customers who want to buy user data equals more users… and the cycle continues) and this is what the lawyers are targeting, it may mean big tech needs to revise its business model. It may mean that this data can no longer be considered part of a company’s secret sauce. If the lawsuit loses it may just mean that companies have to come to terms that the legal system that once shielded these companies, or at the very least looked away, may no longer be favorable as they once were. Companies may begin to scrutinize their own systems and innovate. Or perhaps it will take another big lawsuit to shake up big tech. We probably will not see the conduction of this case and all its implications until 2022.

Data Science Implications for Climate Change Policy

In other areas, we are seeing the consequences of our actions in real-time. In 2020 we witnessed fires in Australia and California, an overly active hurricane season, as well as one of the warmest years on record. Weather forecasting and climate change are seeing extremes and these outliers are making an already difficult forecasting challenge even more challenging. Adversarial training is a machine learning methodology that has helped make understanding climate change easier. 

What is adversarial training? It isn’t new. My first recognition of this technique was from the book Zero History in which the character Pep wears a shirt that “...compels erasure. That which the camera sees, bearing the sigil, it deletes from the recalled image.” This was brought to life by Belgian researchers (not sure whether or not they read the book) who created patterns that fool facial recognition software and other image recognition software into “ignoring” that part of the image. Another public incident was the Tesla self-driving software vulnerability that misconstrued a 35 mile per hour sign for an 80 mile per hour sign by simply adding a small piece of electrical tape to the sign. Generally, it works by giving the machine learning model input meant to deceive the algorithm. While this technique is often used as a hacking tool, in this case, it was used to strengthen climate forecasting models and increase accuracy. 

The researchers at the US Department of Energy’s National Renewable Energy Laboratory used this technique by creating competing models that are able to better distinguish real from fake inputs and thus more realistic and higher resolution models. This is important particularly in climate data which often lacks high resolution data for different scenarios. Additionally computational methods traditionally used are slow and cumbersome, requiring computationally dense physics equations and additional data storage which costs organizations money and time. This improved efficiency means high resolution climate data will be available both on a global scale and smaller regional views as well. As these high resolution scenarios become more widely available and an administration more favorable to climate change is installed, climate change policy will become a high priority item for government and industry alike. 

2020 brought with it a requirement that all businesses and organizations adapt to new and ever-changing market circumstances, relying on embracing technology in order to weather the storm that will carry on into the upcoming year. As organizations continue to look for ways to become more efficient and stay competitive in this new environment I think we will see many more advancements and a democratization of data science as companies of varying sizes and industry embrace the technology and begin to use it in different ways. With this use will come policy changes and a new focus and need for data ethics. I’m cautiously optimistic about what we might see in 2021 and beyond. As we have seen in the past year, when there is a void in policy and procedures the systems we rely on can quickly be misinterpreted or decimated entirely. We’ve seen these systems have far reaching effects in our everyday lives, becoming more than just data policy, but having lasting effects on human rights. I strongly advise Biden and his new administration to think carefully about the human impact as they review and create new AI and data ethics policies: our climate, our social interactions, the work of our government agencies and so much more could be at stake.