Feed aggregator
Q&A: David Whelihan on the challenges of operating in the Arctic
To most, the Arctic can feel like an abstract place, difficult to imagine beyond images of ice and polar bears. But researcher David Whelihan of MIT Lincoln Laboratory's Advanced Undersea Systems and Technology Group is no stranger to the Arctic. Through Operation Ice Camp, a U.S. Navy–sponsored biennial mission to assess operational readiness in the Arctic region, he has traveled to this vast and remote wilderness twice over the past few years to test low-cost sensor nodes developed by the group to monitor loss in Arctic sea ice extent and thickness. The research team envisions establishing a network of such sensors across the Arctic that will persistently detect ice-fracturing events and correlate these events with environmental conditions to provide insights into why the sea ice is breaking up. Whelihan shared his perspectives on why the Arctic matters and what operating there is like.
Q: Why do we need to be able to operate in the Arctic?
A: Spanning approximately 5.5 million square miles, the Arctic is huge, and one of its salient features is that the ice covering much of the Arctic Ocean is decreasing in volume with every passing year. Melting ice opens up previously impassable areas, resulting in increasing interest from potential adversaries and allies alike for activities such as military operations, commercial shipping, and natural resource extraction. Through Alaska, the United States has approximately 1,060 miles of Arctic coastline that is becoming much more accessible because of reduced ice cover. So, U.S. operation in the Arctic is a matter of national security.
Q: What are the technological limitations to Arctic operations?
A: The Arctic is an incredibly harsh environment. The cold kills battery life, so collecting sensor data at high rates over long periods of time is very difficult. The ice is dynamic and can easily swallow or crush sensors. In addition, most deployments involve "boots-on-the-ice," which is expensive and at times dangerous. One of the technological limitations is how to deploy sensors while keeping humans alive.
Q: How does the group's sensor node R&D work seek to support Arctic operations?
A: A lot of the work we put into our sensors pertains to deployability. Our ultimate goal is to free researchers from going onto the ice to deploy sensors. This goal will become increasingly necessary as the shrinking ice pack becomes more dynamic, unstable, and unpredictable. At the last Operation Ice Camp (OIC) in March 2024, we built and rapidly tested deployable and recoverable sensors, as well as novel concepts such as using UAVs (uncrewed aerial vehicles), or drones, as "data mules" that can fly out to and interrogate the sensors to see what they captured. We also built a prototype wearable system that cues automatic download of sensor data over Wi-Fi so that operators don't have to take off their gloves.
Q: The Arctic Circle is the northernmost region on Earth. How do you reach this remote place?
A: We usually fly on commercial airlines from Boston to Seattle to Anchorage to Prudhoe Bay on the North Slope of Alaska. From there, the Navy flies us on small prop planes, like Single and Twin Otters, about 200 miles north and lands us on an ice runway built by the Navy's Arctic Submarine Lab (ASL). The runway is part of a temporary camp that ASL establishes on floating sea ice for their operational readiness exercises conducted during OIC.
Q: Think back to the first time you stepped foot in the Arctic. Can you paint a picture of what you experienced?
A: My first experience was at Prudhoe Bay, coming out of the airport, which is a corrugated metal building with a single gate. Before you open the door to the outside, a sign warns you to be on the lookout for polar bears. Walking out into the sheer desolation and blinding whiteness of everything made me realize I was experiencing something very new.
When I flew out onto the ice and stepped out of the plane, I was amazed that the area could somehow be even more desolate. Bright white snowy ice goes in every direction, broken up by pressure ridges that form when ice sheets collide. The sun is low, and seems to move horizontally only. It is very hard to tell the time. The air temperature is really variable. On our first trip in 2022, it really wasn't (relatively) that cold — only around minus 5 or 10 degrees during the day. On our second trip in 2024, we were hit by minus 30 almost every day, and with winds of 20 to 25 miles per hour. The last night we were on the ice that year, it warmed up a bit to minus 10 to 20, but the winds kicked up and started blowing snow onto the heaters attached to our tents. Those heaters started failing one by one as the blowing snow covered them, blocking airflow. After our heater failed, I asked myself, while warm in my bed, whether I wanted to go outside to the command tent for help or try to make it until dawn in my thick sleeping bag. I picked the first option, but mostly because the heater control was beeping loudly right next to my bunk, so I couldn’t sleep anyway. Shout-out to the ASL staff who ran around fixing heaters all night!
Q: How do you survive in a place generally inhospitable to humans?
A: In partnership with the native population, ASL brings a lot of gear — from insulated, heated tents and communications equipment to large snowblowers to keep the runway clear. A few months before OIC, participants attend training on what conditions you will be exposed to and how to protect yourself through appropriate clothing, and how to use survival gear in case of an emergency.
Q: Do you have plans to return to the Arctic?
A: We are hoping to go back this winter as part of OIC 2026! We plan to test a through-ice communication device. Communicating through 4 to 12 feet of ice is pretty tricky but could allow us to connect underwater drones and stationary sensors under the ice to the rest of the world. To support the through-ice communication system, we will repurpose our sensor-node boxes deployed during OIC 2024. If this setup works, those same boxes could be used as control centers for all sorts of undersea systems and relay information about the under-ice world back home via satellite.
Q: What lessons learned will you bring to your upcoming trip, and any potential future trips?
A: After the first trip, I had a visceral understanding of how hard operating there is. Prototyping of systems becomes a different game. Prototypes are often fragile, but fragility doesn't go over too well on the ice. So, there is a robustification step, which can take some time.
On this last trip, I realized that you have to really be careful with your energy expenditure and pace yourself. While the average adult may require about 2,000 calories a day, an Arctic explorer may burn several times more than that exerting themselves (we do a lot of walking around camp) and keeping warm. Usually, we live on the same freeze-dried food that you would take on camping trips. Each package only has so many calories, so you find yourself eating multiple of those and supplementing with lots of snacks such as Clif Bars or, my favorite, Babybel cheeses (which I bring myself). You also have to be really careful of dehydration. Your body's reaction to extreme cold is to reduce blood flow to your skin, which generally results in less liquid in your body. We have to drink constantly — water, cocoa, and coffee — to avoid dehydration.
We only have access to the ice every two years with the Navy, so we try to make the most of our time. In the several-day lead-up to our field expedition, my research partner Ben and I were really pushing ourselves to ready our sensor nodes for deployment and probably not eating and drinking as regularly as we should. When we ventured to our sensor deployment site about 5 kilometers outside of camp, I had to learn to slow down so I didn't sweat under my gear, as sweating in the extremely cold conditions can quickly lead to hypothermia. I also learned to pay more attention to exposed places on my face, as I got a bit of frostnip around my goggles.
Operating in the Arctic is a fine balance: you can't spend too much time out there, but you also can't rush.
Hacking Electronic Safes
Vulnerabilities in electronic safes that use Securam Prologic locks:
While both their techniques represent glaring security vulnerabilities, Omo says it’s the one that exploits a feature intended as a legitimate unlock method for locksmiths that’s the more widespread and dangerous. “This attack is something where, if you had a safe with this kind of lock, I could literally pull up the code right now with no specialized hardware, nothing,” Omo says. “All of a sudden, based on our testing, it seems like people can get into almost any Securam Prologic lock in the world.”...
Chevron: Lawyer in $51B lawsuit failed to disclose support for climate research
Conspiracy theories dominate Marjorie Taylor Greene’s weather hearing
Climate change boosted summer heat that killed thousands in Europe
Trump administration asks court to kill Vermont ‘climate Superfund’ law
Brits tell Starmer to confront Trump on climate ‘ignorance’
Texas AG investigating shareholder advisers over climate policies
King Charles will warn Trump about the fate of the planet. Trump probably won’t listen.
Trump lieutenants tell Europe to stop worrying about climate change
EU likely to miss latest UN deadline to submit climate pledge
Sweden, Finland urge EU to rethink climate targets for forests
Environmentalists urge Petrobras to speed up shift to renewables
Health losses attributed to anthropogenic climate change
Nature Climate Change, Published online: 17 September 2025; doi:10.1038/s41558-025-02399-7
The authors assess the growing field of climate change health impact attribution. They show literature bias towards direct heat effects and extreme weather in high-income countries, highlighting the lack of global representation in current efforts.California, Tell Governor Newsom: Regulate AI Police Reports and Sign S.B. 524
The California legislature has passed a necessary piece of legislation, S.B. 524, which starts to regulate police reports written by generative AI. Now, it’s up to us to make sure Governor Newsom will sign the bill.
We must make our voices heard. These technologies obscure certain records and drafts from public disclosure. Vendors have invested heavily on their ability to sell police genAI.
AI-generated police reports are spreading rapidly. The most popular product on the market is Axon’s Draft One, which is already one of the country’s biggest purveyors of police tech, including body-worn cameras. By bundling their products together, Axon has capitalized on its customer base to spread their untransparent and potentially harmful genAI product.
Many things can go wrong when genAI is used to write narrative police reports. First, because the product relies on body-worn camera audio, there’s a big chance of the AI draft missing context like sarcasm, culturally-specific or contextual vocabulary use and slang, languages other than English. While police are expected to edit the AI’s version of events to make up for these flaws, many officers will defer to the AI. Police are also supposed to make an independent decision before arresting a person who was identified by face recognition–and police mess that up all the time. The prosecutor of King County, Washington, has forbidden local officers from using Draft One out of fear that it is unreliable.
Then, of course, there’s the matter of dishonesty. Many public defenders and criminal justice practitioners have voiced concerns about what this technology would do to cross examination. If caught with a different story on the stand than the one in their police report, an officer can easily say, “the AI wrote that and I didn’t edit well enough.” The genAI creates a layer of plausible deniability. Carelessness is a very different offense than lying on the stand.
To make matters worse, an investigation by EFF found that Axon’s Draft One product defies transparency by design. The technology is deliberately built to obscure what portion of a finished report was written by AI and which portions were written by an officer–making it difficult to determine if an officer is lying about which portions of a report were written by AI.
But now, California has an important chance to join with other states like Utah that are passing laws to reign in these technologies, and what minimum safeguards and transparency must go along with using them.
S.B. 524 does several important things: It mandates that police reports written by AI include disclaimers on every page or within the body of the text that make it clear that this report was written in part or in total by a computer. It also says that any reports written by AI must retain their first draft. That way, it should be easier for defense attorneys, judges, police supervisors, or any other auditing entity to see which portions of the final report were written by AI and which parts were written by the officer. Further, the bill requires officers to sign and verify that they read the report and its facts are correct. And it bans AI vendors from selling or sharing the information a police agency provided to the AI.
These common-sense, first-step reforms are important: watchdogs are struggling to figure out where and how AI is being used in a police context. In fact, Axon’s Draft One, would be out of compliance with this bill, which would require them to redesign their tool to make it more transparent—a small win for communities everywhere.
So now we’re asking you: help us make a difference. Use EFF’s Action Center to tell Governor Newsom to sign S.B. 524 into law!
Decoding the sounds of battery formation and degradation
Before batteries lose power, fail suddenly, or burst into flames, they tend to produce faint sounds over time that provide a signature of the degradation processes going on within their structure. But until now, nobody had figured out how to interpret exactly what those sounds meant, and how to distinguish between ordinary background noise and significant signs of possible trouble.
Now, a team of researchers at MIT’s Department of Chemical Engineering have done a detailed analysis of the sounds emanating from lithium ion batteries, and has been able to correlate particular sound patterns with specific degradation processes taking place inside the cells. The new findings could provide the basis for relatively simple, totally passive and nondestructive devices that could continuously monitor the health of battery systems, for example in electric vehicles or grid-scale storage facilities, to provide ways of predicting useful operating lifetimes and forecasting failures before they occur.
The findings were reported Sept. 5 in the journal Joule, in a paper by MIT graduate students Yash Samantaray and Alexander Cohen, former MIT research scientist Daniel Cogswell PhD ’10, and Chevron Professor of Chemical Engineering and professor of mathematics Martin Z. Bazant.
“In this study, through some careful scientific work, our team has managed to decode the acoustic emissions,” Bazant says. “We were able to classify them as coming from gas bubbles that are generated by side reactions, or by fractures from the expansion and contraction of the active material, and to find signatures of those signals even in noisy data.”
Samantaray explains that, “I think the core of this work is to look at a way to investigate internal battery mechanisms while they’re still charging and discharging, and to do this nondestructively.” He adds, “Out there in the world now, there are a few methods that exist, but most are very expensive and not really conducive to batteries in their normal format.”
To carry out their analysis, the team coupled electrochemical testing with recording of the acoustic emissions, under real-world charging and discharging conditions, using detailed signal processing to correlate the electrical and acoustic data. By doing so, he says, “we were able to come up with a very cost-effective and efficient method of actually understanding gas generation and fracture of materials.”
Gas generation and fracturing are two primary mechanisms of degradation and failure in batteries, so being able to detect and distinguish those processes, just by monitoring the sounds produced by the batteries, could be a significant tool for those managing battery systems.
Previous approaches have simply monitored the sounds and recorded times when the overall sound level exceeded some threshold. But in this work, by simultaneously monitoring the voltage and current as well as the sound characteristics, Bazant says, “We know that [sound] emissions happen at a certain potential [voltage], and that helps us identify what the process might be that is causing that emission.”
After these tests, they would then take the batteries apart and study them under an electron microscope to detect fracturing of the materials.
In addition, they took a wavelet transform — essentially, a way of encoding the frequency and duration of each signal that is captured, providing distinct signatures that can then be more easily extracted from background noise. “No one had done that before,” Bazant says, “so that was another breakthrough.”
Acoustic emissions are widely used in engineering, he points out, for example to monitor structures such as bridges for signs of incipient failure. “It’s a great way to monitor a system,” he says, “because those emissions are happening whether you’re listening to them or not,” so by listening, you can learn something about internal processes that would otherwise be invisible.
With batteries, he says, “we often have a hard time interpreting the voltage and current information as precisely as we’d like, to know what’s happening inside a cell. And so this offers another window into the cell’s state of health, including its remaining useful life, and safety, too.” In a related paper with Oak Ridge National Laboratory researchers, the team has shown that acoustic emissions can provide an early warning of thermal runaway, a situation that can lead to fires if not caught. The new study suggests that these sounds can be used to detect gas generation prior to combustion, “like seeing the first tiny bubbles in a pot of heated water, long before it boils,” says Bazant.
The next step will be to take this new knowledge of how certain sounds relate to specific conditions, and develop a practical, inexpensive monitoring system based on this understanding. For example, the team has a grant from Tata Motors to develop a battery monitoring system for its electric vehicles. “Now, we know what to look for, and how to correlate that with lifetime and health and safety,” Bazant says.
One possible application of this new understanding, Samantaray says, is “as a lab tool for groups that are trying to develop new materials or test new environments, so they can actually determine gas generation or active material fracturing without having to open up the battery.”
Bazant adds that the system could also be useful for quality control in battery manufacturing. “The most expensive and rate-limiting process in battery production is often the formation cycling,” he says. This is the process where batteries are cycled through charging and discharging to break them in, and part of that process involves chemical reactions that release some gas. The new system would allow detection of these gas formation signatures, he says, “and by sensing them, it may be easier to isolate well-formed cells from poorly formed cells very early, even before the useful life of the battery, when it’s being made,” he says.
The work was supported by the Toyota Research Institute, the Center for Battery Sustainability, the National Science Foundation, and the Department of Defense, and made use of the facilities of MIT.nano.
A new community for computational science and engineering
For the past decade, MIT has offered doctoral-level study in computational science and engineering (CSE) exclusively through an interdisciplinary program designed for students applying computation within a specific science or engineering field.
As interest grew among students focused primarily on advancing CSE methodology itself, it became clear that a dedicated academic home for this group — students and faculty deeply invested in the foundations of computational science and engineering — was needed.
Now, with a stand-alone CSE PhD program, they have not only a space for fostering discovery in the cross-cutting methodological dimensions of computational science and engineering, but also a tight-knit community.
“This program recognizes the existence of computational science and engineering as a discipline in and of itself, so you don’t have to be doing this work through the lens of mechanical or chemical engineering, but instead in its own right,” says Nicolas Hadjiconstantinou, co-director of the Center for Computational Science and Engineering (CCSE).
Offered by CCSE and launched in 2023, the stand-alone program blends both coursework and a thesis, much like other MIT PhD programs, yet its methodological focus sets it apart from other Institute offerings.
“What’s unique about this program is that it’s not hosted by one specific department. The stand-alone program is, at its core, about computational science and cross-cutting methodology. We connect this research with people in a lot of different application areas. We have oceanographers, people doing materials science, students with a focus on aeronautics and astronautics, and more,” says outgoing co-director Youssef Marzouk, now the associate dean of the MIT Schwarzman College of Computing.
Expanding horizons
Hadjiconstantinou, the Quentin Berg Professor of Mechanical Engineering, and Marzouk, the Breene M. Kerr Professor of Aeronautics and Astronautics, have led the center’s efforts since 2018, and developed the program and curriculum together. The duo was intentional about crafting a program that fosters students’ individual research while also exposing them to all the field has to offer.
To expand students’ horizons and continue to build a collaborative community, the PhD in CSE program features two popular seminar series: weekly community seminars that focus primarily on internal speakers (current graduate students, postdocs, research scientists, and faculty), and monthly distinguished seminars in CSE, which are Institute-wide and bring external speakers from various institutions and industry roles.
“Something surprising about the program has been the seminars. I thought it would be the same people I see in my classes and labs, but it’s much broader than that,” says Emily Williams, a fourth-year PhD student and a Department of Energy Computational Science graduate fellow. “One of the most interesting seminars was around simulating fluid flow for biomedical applications. My background is in fluids, so I understand that part, but seeing it applied in a totally different domain than what I work in was eye-opening,” says Williams.
That seminar, “Astrophysical Fluid Dynamics at Exascale,” presented by James Stone, a professor in the School of Natural Sciences at the Institute for Advanced Study and at Princeton University, represented one of many opportunities for CSE students to engage with practitioners in small groups, gaining academic insight as well as a wider perspective on future career paths.
Designing for impact
The interdisciplinary PhD program served as a departure point from which Hadjiconstantinou and Marzouk created a new offering that was uniquely its own.
For Marzouk, that meant focusing on expanding the stand-alone program to be able to constantly grow and pivot to retain relevancy as technology speeds up, too: “In my view, the vitality of this program is that science and engineering applications nowadays rest on computation in a really foundational way, whether it’s engineering design or scientific discovery. So it’s essential to perform research on the building blocks of this kind of computation. This research also has to be shaped by the way that we apply it so that scientists or engineers will actually use it,” Marzouk says.
The curriculum is structured around six core focus areas, or “ways of thinking,” that are fundamental to CSE:
- Discretization and numerical methods for partial differential equations;
- Optimization methods;
- Inference, statistical computing, and data-driven modeling;
- High performance computing, software engineering, and algorithms;
- Mathematical foundations (e.g., functional analysis, probability); and
- Modeling (i.e., a subject that treats computational modeling in any science or engineering discipline).
Students select and build their own thesis committee that consists of faculty from across MIT, not just those associated with CCSE. The combination of a curriculum that’s “modern and applicable to what employers are looking for in industry and academics," according to Williams, and the ability to build your own group of engaged advisors, allows for a level of specialization that’s hard to find elsewhere.
“Academically, I feel like this program is designed in such a flexible and interdisciplinary way. You have a lot of control in terms of which direction you want to go in,” says Rosen Yu, a PhD student. Yu’s research is focused on engineering design optimization, an interest she discovered during her first year of research at MIT with Professor Faez Ahmed. The CSE PhD was about to launch, and it became clear that her research interests skewed more toward computation than the existing mechanical engineering degree; it was a natural fit.
“At other schools, you often see just a pure computer science program or an engineering department with hardly any intersection. But this CSE program, I like to say it’s like a glue between these two communities,” says Yu.
That “glue” is strengthening, with more students matriculating each year, as well as Institute faculty and staff becoming affiliated with CSE. While the thesis topics of students range from WIlliams’ stochastic methods for model reduction of multiscale chaotic systems to scalable and robust GPU-cased optimization for energy systems, the goal of the program remains the same: develop students and research that will make a difference.
“That's why MIT is an ‘Institute of Technology’ and not a ‘university.’ There’s always this question, no matter what you’re studying: what is it good for? Our students will go on to work in systems biology, simulators of climate models, electrification, hypersonic vehicles, and more, but the whole point is that their research is helping with something,” says Hadjiconstantinou.
How to build AI scaling laws for efficient LLM training and budget maximization
When researchers are building large language models (LLMs), they aim to maximize performance under a particular computational and financial budget. Since training a model can amount to millions of dollars, developers need to be judicious with cost-impacting decisions about, for instance, the model architecture, optimizers, and training datasets before committing to a model. To anticipate the quality and accuracy of a large model’s predictions, practitioners often turn to scaling laws: using smaller, cheaper models to try to approximate the performance of a much larger target model. The challenge, however, is that there are thousands of ways to create a scaling law.
New work from MIT and MIT-IBM Watson AI Lab researchers addresses this by amassing and releasing a collection of hundreds of models and metrics concerning training and performance to approximate more than a thousand scaling laws. From this, the team developed a meta-analysis and guide for how to select small models and estimate scaling laws for different LLM model families, so that the budget is optimally applied toward generating reliable performance predictions.
“The notion that you might want to try to build mathematical models of the training process is a couple of years old, but I think what was new here is that most of the work that people had been doing before is saying, ‘can we say something post-hoc about what happened when we trained all of these models, so that when we’re trying to figure out how to train a new large-scale model, we can make the best decisions about how to use our compute budget?’” says Jacob Andreas, associate professor in the Department of Electrical Engineering and Computer Science and principal investigator with the MIT-IBM Watson AI Lab.
The research was recently presented at the International Conference on Machine Learning by Andreas, along with MIT-IBM Watson AI Lab researchers Leshem Choshen and Yang Zhang of IBM Research.
Extrapolating performance
No matter how you slice it, developing LLMs is an expensive endeavor: from decision-making regarding the numbers of parameters and tokens, data selection and size, and training techniques to determining output accuracy and tuning to the target applications and tasks. Scaling laws offer a way to forecast model behavior by relating a large model’s loss to the performance of smaller, less-costly models from the same family, avoiding the need to fully train every candidate. Mainly, the differences between the smaller models are the number of parameters and token training size. According to Choshen, elucidating scaling laws not only enable better pre-training decisions, but also democratize the field by enabling researchers without vast resources to understand and build effective scaling laws.
The functional form of scaling laws is relatively simple, incorporating components from the small models that capture the number of parameters and their scaling effect, the number of training tokens and their scaling effect, and the baseline performance for the model family of interest. Together, they help researchers estimate a target large model’s performance loss; the smaller the loss, the better the target model’s outputs are likely to be.
These laws allow research teams to weigh trade-offs efficiently and to test how best to allocate limited resources. They’re particularly useful for evaluating scaling of a certain variable, like the number of tokens, and for A/B testing of different pre-training setups.
In general, scaling laws aren’t new; however, in the field of AI, they emerged as models grew and costs skyrocketed. “It’s like scaling laws just appeared at some point in the field,” says Choshen. “They started getting attention, but no one really tested how good they are and what you need to do to make a good scaling law.” Further, scaling laws were themselves also a black box, in a sense. “Whenever people have created scaling laws in the past, it has always just been one model, or one model family, and one dataset, and one developer,” says Andreas. “There hadn’t really been a lot of systematic meta-analysis, as everybody is individually training their own scaling laws. So, [we wanted to know,] are there high-level trends that you see across those things?”
Building better
To investigate this, Choshen, Andreas, and Zhang created a large dataset. They collected LLMs from 40 model families, including Pythia, OPT, OLMO, LLaMA, Bloom, T5-Pile, ModuleFormer mixture-of-experts, GPT, and other families. These included 485 unique, pre-trained models, and where available, data about their training checkpoints, computational cost (FLOPs), training epochs, and the seed, along with 1.9 million performance metrics of loss and downstream tasks. The models differed in their architectures, weights, and so on. Using these models, the researchers fit over 1,000 scaling laws and compared their accuracy across architectures, model sizes, and training regimes, as well as testing how the number of models, inclusion of intermediate training checkpoints, and partial training impacted the predictive power of scaling laws to target models. They used measurements of absolute relative error (ARE); this is the difference between the scaling law’s prediction and the observed loss of a large, trained model. With this, the team compared the scaling laws, and after analysis, distilled practical recommendations for AI practitioners about what makes effective scaling laws.
Their shared guidelines walk the developer through steps and options to consider and expectations. First, it’s critical to decide on a compute budget and target model accuracy. The team found that 4 percent ARE is about the best achievable accuracy one could expect due to random seed noise, but up to 20 percent ARE is still useful for decision-making. The researchers identified several factors that improve predictions, like including intermediate training checkpoints, rather than relying only on final losses; this made scaling laws more reliable. However, very early training data before 10 billion tokens are noisy, reduce accuracy, and should be discarded. They recommend prioritizing training more models across a spread of sizes to improve robustness of the scaling law’s prediction, not just larger models; selecting five models provides a solid starting point.
Generally, including larger models improves prediction, but costs can be saved by partially training the target model to about 30 percent of its dataset and using that for extrapolation. If the budget is considerably constrained, developers should consider training one smaller model within the target model family and borrow scaling law parameters from a model family with similar architecture; however, this may not work for encoder–decoder models. Lastly, the MIT-IBM research group found that when scaling laws were compared across model families, there was strong correlation between two sets of hyperparameters, meaning that three of the five hyperparameters explained nearly all of the variation and could likely capture the model behavior. Together, these guidelines provide a systematic approach to making scaling law estimation more efficient, reliable, and accessible for AI researchers working under varying budget constraints.
Several surprises arose during this work: small models partially trained are still very predictive, and further, the intermediate training stages from a fully trained model can be used (as if they are individual models) for prediction of another target model. “Basically, you don’t pay anything in the training, because you already trained the full model, so the half-trained model, for instance, is just a byproduct of what you did,” says Choshen. Another feature Andreas pointed out was that, when aggregated, the variability across model families and different experiments jumped out and was noisier than expected. Unexpectedly, the researchers found that it’s possible to utilize the scaling laws on large models to predict performance down to smaller models. Other research in the field has hypothesized that smaller models were a “different beast” compared to large ones; however, Choshen disagrees. “If they’re totally different, they should have shown totally different behavior, and they don’t.”
While this work focused on model training time, the researchers plan to extend their analysis to model inference. Andreas says it’s not, “how does my model get better as I add more training data or more parameters, but instead as I let it think for longer, draw more samples. I think there are definitely lessons to be learned here about how to also build predictive models of how much thinking you need to do at run time.” He says the theory of inference time scaling laws might become even more critical because, “it’s not like I'm going to train one model and then be done. [Rather,] it’s every time a user comes to me, they’re going to have a new query, and I need to figure out how hard [my model needs] to think to come up with the best answer. So, being able to build those kinds of predictive models, like we’re doing in this paper, is even more important.”
This research was supported, in part, by the MIT-IBM Watson AI Lab and a Sloan Research Fellowship.
Microsoft Still Uses RC4
Senator Ron Wyden has asked the Federal Trade Commission to investigate Microsoft over its continued use of the RC4 encryption algorithm. The letter talks about a hacker technique called Kerberoasting, that exploits the Kerberos authentication system.