MIT Latest News

Subscribe to MIT Latest News feed
MIT News is dedicated to communicating to the media and the public the news and achievements of the students, faculty, staff and the greater MIT community.
Updated: 12 hours 44 min ago

AI-enabled control system helps autonomous drones stay on target in uncertain environments

Mon, 06/09/2025 - 4:40pm

An autonomous drone carrying water to help extinguish a wildfire in the Sierra Nevada might encounter swirling Santa Ana winds that threaten to push it off course. Rapidly adapting to these unknown disturbances inflight presents an enormous challenge for the drone’s flight control system.

To help such a drone stay on target, MIT researchers developed a new, machine learning-based adaptive control algorithm that could minimize its deviation from its intended trajectory in the face of unpredictable forces like gusty winds.

Unlike standard approaches, the new technique does not require the person programming the autonomous drone to know anything in advance about the structure of these uncertain disturbances. Instead, the control system’s artificial intelligence model learns all it needs to know from a small amount of observational data collected from 15 minutes of flight time.

Importantly, the technique automatically determines which optimization algorithm it should use to adapt to the disturbances, which improves tracking performance. It chooses the algorithm that best suits the geometry of specific disturbances this drone is facing.

The researchers train their control system to do both things simultaneously using a technique called meta-learning, which teaches the system how to adapt to different types of disturbances.

Taken together, these ingredients enable their adaptive control system to achieve 50 percent less trajectory tracking error than baseline methods in simulations and perform better with new wind speeds it didn’t see during training.

In the future, this adaptive control system could help autonomous drones more efficiently deliver heavy parcels despite strong winds or monitor fire-prone areas of a national park.

“The concurrent learning of these components is what gives our method its strength. By leveraging meta-learning, our controller can automatically make choices that will be best for quick adaptation,” says Navid Azizan, who is the Esther and Harold E. Edgerton Assistant Professor in the MIT Department of Mechanical Engineering and the Institute for Data, Systems, and Society (IDSS), a principal investigator of the Laboratory for Information and Decision Systems (LIDS), and the senior author of a paper on this control system.

Azizan is joined on the paper by lead author Sunbochen Tang, a graduate student in the Department of Aeronautics and Astronautics, and Haoyuan Sun, a graduate student in the Department of Electrical Engineering and Computer Science. The research was recently presented at the Learning for Dynamics and Control Conference.

Finding the right algorithm

Typically, a control system incorporates a function that models the drone and its environment, and includes some existing information on the structure of potential disturbances. But in a real world filled with uncertain conditions, it is often impossible to hand-design this structure in advance.

Many control systems use an adaptation method based on a popular optimization algorithm, known as gradient descent, to estimate the unknown parts of the problem and determine how to keep the drone as close as possible to its target trajectory during flight. However, gradient descent is only one algorithm in a larger family of algorithms available to choose, known as mirror descent.

“Mirror descent is a general family of algorithms, and for any given problem, one of these algorithms can be more suitable than others. The name of the game is how to choose the particular algorithm that is right for your problem. In our method, we automate this choice,” Azizan says.

In their control system, the researchers replaced the function that contains some structure of potential disturbances with a neural network model that learns to approximate them from data. In this way, they don’t need to have an a priori structure of the wind speeds this drone could encounter in advance.

Their method also uses an algorithm to automatically select the right mirror-descent function while learning the neural network model from data, rather than assuming a user has the ideal function picked out already. The researchers give this algorithm a range of functions to pick from, and it finds the one that best fits the problem at hand.

“Choosing a good distance-generating function to construct the right mirror-descent adaptation matters a lot in getting the right algorithm to reduce the tracking error,” Tang adds.

Learning to adapt

While the wind speeds the drone may encounter could change every time it takes flight, the controller’s neural network and mirror function should stay the same so they don’t need to be recomputed each time.

To make their controller more flexible, the researchers use meta-learning, teaching it to adapt by showing it a range of wind speed families during training.

“Our method can cope with different objectives because, using meta-learning, we can learn a shared representation through different scenarios efficiently from data,” Tang explains.

In the end, the user feeds the control system a target trajectory and it continuously recalculates, in real-time, how the drone should produce thrust to keep it as close as possible to that trajectory while accommodating the uncertain disturbance it encounters.

In both simulations and real-world experiments, the researchers showed that their method led to significantly less trajectory tracking error than baseline approaches with every wind speed they tested.

“Even if the wind disturbances are much stronger than we had seen during training, our technique shows that it can still handle them successfully,” Azizan adds.

In addition, the margin by which their method outperformed the baselines grew as the wind speeds intensified, showing that it can adapt to challenging environments.

The team is now performing hardware experiments to test their control system on real drones with varying wind conditions and other disturbances.

They also want to extend their method so it can handle disturbances from multiple sources at once. For instance, changing wind speeds could cause the weight of a parcel the drone is carrying to shift in flight, especially when the drone is carrying sloshing payloads.

They also want to explore continual learning, so the drone could adapt to new disturbances without the need to also be retrained on the data it has seen so far.

“Navid and his collaborators have developed breakthrough work that combines meta-learning with conventional adaptive control to learn nonlinear features from data. Key to their approach is the use of mirror descent techniques that exploit the underlying geometry of the problem in ways prior art could not. Their work can contribute significantly to the design of autonomous systems that need to operate in complex and uncertain environments,” says Babak Hassibi, the Mose and Lillian S. Bohn Professor of Electrical Engineering and Computing and Mathematical Sciences at Caltech, who was not involved with this work.

This research was supported, in part, by MathWorks, the MIT-IBM Watson AI Lab, the MIT-Amazon Science Hub, and the MIT-Google Program for Computing Innovation.

Envisioning a future where health care tech leaves some behind

Mon, 06/09/2025 - 4:10pm

Will the perfect storm of potentially life-changing, artificial intelligence-driven health care and the desire to increase profits through subscription models alienate vulnerable patients?

For the third year in a row, MIT's Envisioning the Future of Computing Prize asked students to describe, in 3,000 words or fewer, how advancements in computing could shape human society for the better or worse. All entries were eligible to win a number of cash prizes.
 
Inspired by recent research on the greater effect microbiomes have on overall health, MIT-WHOI Joint Program in Oceanography and Applied Ocean Science and Engineering PhD candidate Annaliese Meyer created the concept of “B-Bots,” a synthetic bacterial mimic designed to regulate gut biomes and activated by Bluetooth.  
 
For the contest, which challenges MIT students to articulate their musings for what a future driven by advances in computing holds, Meyer submitted a work of speculative fiction about how recipients of a revolutionary new health-care technology find their treatment in jeopardy with the introduction of a subscription-based pay model.

In her winning paper, titled “(Pre/Sub)scribe,” Meyer chronicles the usage of B-Bots from the perspective of both their creator and a B-Bots user named Briar. They celebrate the effects of the supplement, helping them manage vitamin deficiencies and chronic conditions like acid reflux and irritable bowel syndrome. Meyer says that the introduction of a B-Bots subscription model “seemed like a perfect opportunity to hopefully make clear that in a for-profit health-care system, even medical advances that would, in theory, be revolutionary for human health can end up causing more harm than good for the many people on the losing side of the massive wealth disparity in modern society.”

As a Canadian, Meyer has experienced the differences between the health care systems in the United States and Canada. She recounts her mother’s recent cancer treatments, emphasizing the cost and coverage of treatments in British Columbia when compared to the U.S.

Aside from a cautionary tale of equity in the American health care system, Meyer hopes readers take away an additional scientific message on the complexity of gut microbiomes. Inspired by her thesis work in ocean metaproteomics, Meyer says, “I think a lot about when and why microbes produce different proteins to adapt to environmental changes, and how that depends on the rest of the microbial community and the exchange of metabolic products between organisms.”

Meyer had hoped to participate in the previous year’s contest, but the time constraints of her lab work put her submission on hold. Now in the midst of thesis work, she saw the contest as a way to add some variety to what she was writing while keeping engaged with her scientific interests. However, writing has always been a passion. “I wrote a lot as a kid (‘author’ actually often preceded ‘scientist’ as my dream job while I was in elementary school), and I still write fiction in my spare time,” she says.

Named the winner of the $10,000 grand prize, Meyer says the essay and presentation preparation were extremely rewarding.

“The chance to explore a new topic area which, though related to my field, was definitely out of my comfort zone, really pushed me as a writer and a scientist. It got me reading papers I’d never have found before, and digging into concepts that I’d barely ever encountered. (Did I have any real understanding of the patent process prior to this? Absolutely not.) The presentation dinner itself was a ton of fun; it was great to both be able to celebrate with my friends and colleagues as well as meet people from a bunch of different fields and departments around MIT.”
 

Envisioning the future of the computing prize
 

Co-sponsored by the Social and Ethical Responsibilities of Computing (SERC), a cross-cutting initiative of the MIT Schwarzman College of Computing and the School of Humanities, Arts, and Social Sciences (SHASS), with support from MAC3 Philanthropies, the contest this year attracted 65 submissions from undergraduate and graduate students across various majors, including brain and cognitive sciences, economics, electrical engineering and computer science, physics, anthropology, and others.

Caspar Hare, associate dean of SERC and professor of philosophy, launched the prize in 2023. He says that the object of the prize was “to encourage MIT students to think about what they’re doing, not just in terms of advancing computing-related technologies, but also in terms of how the decisions they make may or may not work to our collective benefit.”

He emphasized that the Envisioning the Future of Computing prize will continue to remain “interesting and important” to the MIT community. There are plans in place to tweak next year’s contest, offering more opportunities for workshops and guidance for those interested in submitting essays.

“Everyone is excited to continue this for as long as it remains relevant, which could be forever,” he says, suggesting that in years to come the prize could give us a series of historical snapshots of what computing-related technologies MIT students found most compelling.

“Computing-related technology is going to be transforming and changing the world. MIT students will remain a big part of that.”

Crowning a winner

As part of a two-stage evaluation process, all the submitted essays were reviewed anonymously by a committee of faculty members from the college, SHASS, and the Department of Urban Studies and Planning. The judges moved forward three finalists based on the papers that were deemed to be the most articulate, thorough, grounded, imaginative, and inspiring.
 
In early May, a live awards ceremony was held where the finalists were invited to give 20-minute presentations on their entries and took questions from the audience. Nearly 140 MIT community members, family members, and friends attended the ceremony in support of the finalists. The audience members and judging panel asked the presenters challenging and thoughtful questions on the societal impact of their fictional computing technologies.
 
A final tally, which comprised 75 percent of their essay score and 25 percent of their presentation score, determined the winner.

This year’s judging panel included:

  • Marzyeh Ghassemi, associate professor in electrical engineering and computer science;
  • Caspar Hare, associate dean of SERC and professor of philosophy;
  • Jason Jackson, associate professor in political economy and urban planning;
  • Brad Skow, professor of philosophy;
  • Armando Solar-Lezama, associate director and chief operating officer of the MIT Computer Science and Artificial Intelligence Laboratory; and
  • Nikos Trichakis, interim associate dean of SERC and associate professor of operations management.

The judges also awarded $5,000 to the two runners-up: Martin Staadecker, a graduate student in the Technology and Policy Program in the Institute for Data, Systems, and Society, for his essay on a fictional token-based system to track fossil fuels, and Juan Santoyo, a PhD candidate in the Department of Brain and Cognitive Sciences, for his short story of a field-deployed AI designed to help the mental health of soldiers in times of conflict. In addition, eight honorable mentions were recognized, with each receiving a cash prize of $1,000.

Helping machines understand visual content with AI

Mon, 06/09/2025 - 3:45pm

Data should drive every decision a modern business makes. But most businesses have a massive blind spot: They don’t know what’s happening in their visual data.

Coactive is working to change that. The company, founded by Cody Coleman ’13, MEng ’15 and William Gaviria Rojas ’13, has created an artificial intelligence-powered platform that can make sense of data like images, audio, and video to unlock new insights.

Coactive’s platform can instantly search, organize, and analyze unstructured visual content to help businesses make faster, better decisions.

“In the first big data revolution, businesses got better at getting value out of their structured data,” Coleman says, referring to data from tables and spreadsheets. “But now, approximately 80 to 90 percent of the data in the world is unstructured. In the next chapter of big data, companies will have to process data like images, video, and audio at scale, and AI is a key piece of unlocking that capability.”

Coactive is already working with several large media and retail companies to help them understand their visual content without relying on manual sorting and tagging. That’s helping them get the right content to users faster, remove explicit content from their platforms, and uncover how specific content influences user behavior.

More broadly, the founders believe Coactive serves as an example of how AI can empower humans to work more efficiently and solve new problems.

“The word coactive means to work together concurrently, and that’s our grand vision: helping humans and machines work together,” Coleman says. “We believe that vision is more important now than ever because AI can either pull us apart or bring us together. We want Coactive to be an agent that pulls us together and gives human beings a new set of superpowers.”

Giving computers vision

Coleman met Gaviria Rojas in the summer before their first yearthrough the MIT Interphase Edge program. Both would go on to major in electrical engineering and computer science and work on bringing MIT OpenCourseWare content to Mexican universities, among other projects.

“That was a great example of entrepreneurship,” Coleman recalls of the OpenCourseWare project. “It was really empowering to be responsible for the business and the software development. It led me to start my own small web-development businesses afterward, and to take [the MIT course] Founder’s Journey.”

Coleman first explored the power of AI at MIT while working as a graduate researcher with the Office of Digital Learning (now MIT Open Learning), where he used machine learning to study how humans learn on MITx, which hosts massive, open online courses created by MIT faculty and instructors.

“It was really amazing to me that you could democratize this transformational journey that I went through at MIT with digital learning — and that you could apply AI and machine learning to create adaptive systems that not only help us understand how humans learn, but also deliver more personalized learning experiences to people around the world,” Coleman says of MITx. “That was also the first time I got to explore video content and apply AI to it.”

After MIT, Coleman went to Stanford University for his PhD, where he worked on lowering barriers to using AI. The research led him to work with companies like Pinterest and Meta on AI and machine-learning applications.

“That’s where I was able to see around the corner into the future of what people wanted to do with AI and their content,” Coleman recalls. “I was seeing how leading companies were using AI to drive business value, and that’s where the initial spark for Coactive came from. I thought, ‘What if we create an enterprise-grade operating system for content and multimodal AI to make that easy?’”

Meanwhile, Gaviria Rojas moved to the Bay Area in 2020 and started working as a data scientist at eBay. As part of the move, he needed help transporting his couch, and Coleman was the lucky friend he called.

“On the car ride, we realized we both saw an explosion happening around data and AI,” Gaviria Rojas says. “At MIT, we got a front row seat to the big data revolution, and we saw people inventing technologies to unlock value from that data at scale. Cody and I realized we had another powder keg about to explode with enterprises collecting tremendous amount of data, but this time it was multimodal data like images, video, audio, and text. There was a missing technology to unlock it at scale. That was AI.”

The platform the founders went on to build — what Coleman describes as an “AI operating system” — is model agnostic, meaning the company can swap out the AI systems under the hood as models continue to improve. Coactive’s platform includes prebuilt applications that business customers can use to do things like search through their content, generate metadata, and conduct analytics to extract insights.

“Before AI, computers would see the world through bytes, whereas humans would see the world through vision,” Coleman says. “Now with AI, machines can finally see the world like we do, and that’s going to cause the digital and physical worlds to blur.”

Improving the human-computer interface

Reuters’ database of images supplies the world’s journalists with millions of photos. Before Coactive, the company relied on reporters manually entering tags with each photo so that the right images would show up when journalists searched for certain subjects.

“It was incredible slow and expensive to go through all of these raw assets, so people just didn’t add tags,” Coleman says. “That meant when you searched for things, there were limited results even if relevant photos were in the database.”

Now, when journalists on Reuters’ website select ‘Enable AI Search,’ Coactive can pull up relevant content based on its AI system’s understanding of the details in each image and video.

“It’s vastly improving the quality of results for reporters, which enables them to tell better, more accurate stories than ever before,” Coleman says.

Reuters is not alone in struggling to manage all of its content. Digital asset management is a huge component of many media and retail companies, who today often rely on manually entered metadata for sorting and searching through that content.

Another Coactive customer is Fandom, which is one of the world’s largest platforms for information around TV shows, videogames, and movies with more than 300 million monthly active users. Fandom is using Coactive to understand visual data in their online communities and help remove excessive gore and sexualized content.

“It used to take 24 to 48 hours for Fandom to review each new piece of content,” Coleman says. “Now with Coactive, they’ve codified their community guidelines and can generate finer-grain information in an average of about 500 milliseconds.”

With every use case, the founders see Coactive as enabling a new paradigm in the ways humans work with machines.

“Throughout the history of human-computer interaction, we’ve had to bend over a keyboard and mouse to input information in a way that machines could understand,” Coleman says. “Now, for the first time, we can just speak naturally, we can share images and video with AI, and it can understand that content. That’s a fundamental change in the way we think about human-computer interactions. The core vision of Coactive is because of that change, we need a new operating system and a new way of working with content and AI.”

Pages