News Dashboard

US

Officials warn of 'moderate' risk after TB outbreak in SF high school - SFGATE

2026-01-29 00:05 | Source: US National News - SFGate

Three active cases have been reported at Archbishop Riordan High School since last November.

S&P 500 futures are little changed as traders weigh tech giants' earnings: Live updates - CNBC

2026-01-28 23:21 | Source: US National News - CNBC

"Magnificent Seven" companies Meta Platforms, Microsoft and Tesla posted earnings results after Wednesday's close.

Bruce Springsteen sings out against Trump in ‘Streets of Minneapolis’ - AP News

2026-01-28 23:13 | Source: US National News - Associated Press

Bruce Springsteen has released a new song, “Streets of Minneapolis,” criticizing President Donald Trump's immigration enforcement. The song describes Minneapolis as “a city aflame” under “King Trump’s private army.” Springsteen says he wrote and recorded it o…

Trump administration finds California’s ban on ‘forced outing’ of students violates federal law - Politico

2026-01-28 22:52 | Source: US National News - Politico

Federal officials threatened to pull education funding unless the state takes steps to amend its rules.

'Halide' Co-Founder Sebastiaan de With Joins Apple's Design Team - MacRumors

2026-01-28 22:37 | Source: US National News - MacRumors

Sebastiaan de With, co-founder of the popular iPhone camera app Halide, today announced that he has joined the Human Interface Design team at Apple. ...

Satena: Colombia launches search for missing plane carrying 15 people - BBC

2026-01-28 21:54 | Source: US National News - BBC News

State ariline Satena says its aircraft carrying 13 passengers and two crew "suffered a fatal accident".

OpenAI Wants To Create Biometric Social Network To Kill X’s Bot Problem - Forbes

2026-01-28 21:05 | Source: US National News - Forbes

OpenAI is quietly building a social network and considering using biometric verification like World’s eyeball scanning orb or Apple’s Face ID to ensure its users are people, not bots.

First pediatric flu death in Washington state highlights rising cases across the state - komonews.com

2026-01-28 21:03 | Source: US National News - KOMO News

A school-age teenager has died after becoming ill from influenza last week, marking the first pediatric influenza death in the state this season.

Brandon Sanderson’s Literary Fantasy Universe ‘Cosmere’ Picked Up by Apple TV (Exclusive) - hollywoodreporter.com

2026-01-28 21:01 | Source: US National News - Hollywood Reporter

It's an unprecedented deal for the author, whose 'Mistborn' series and 'The Stormlight Archive' are being eyed for film and television adaptation, respectively.

A Seat on Trump’s “Board of Peace” Costs $1 Billion. Guess Who Gets the Money. - Slate

2026-01-28 20:46 | Source: US National News - Slate Magazine

The leaders of China, India, and Russia are among those who haven’t yet responded.

Home Depot to cut 800 corporate jobs, require workers back to office full time - ajc.com

2026-01-28 20:35 | Source: US National News - Atlanta Journal Constitution

Home Depot says it is eliminating about 800 corporate jobs tied to its Vinings headquarters.

Where a nor’easter will bring heavy snow, strong winds and waves this weekend - The Washington Post

2026-01-28 19:13 | Source: US National News - The Washington Post

A potent system will bring the potential for blizzard-like conditions and coastal flooding from parts of the Southeast to New England.

Trump's National Guard deployments could cost over $1 billion this year, CBO projects - NPR

2026-01-28 19:00 | Source: US National News - NPR

The operation in Washington, D.C. alone is projected to cost upwards of $660 million if it runs through the end of this year as expected, according to new data released by the nonpartisan Congressional Budget Office.

What it’s like each day in Minneapolis - CNN

2026-01-28 18:19 | Source: US National News - CNN

Residents of the Twin Cities region share their personal accounts of what it’s like to live in the midst of an ICE surge.

DEBRIEF: What happened on Day 3 of the Barcelona Shakedown? - F1 - The Official Home of Formula 1® Racing

2026-01-28 17:04 | Source: US National News - Formula 1

With the third day of the Barcelona Shakedown done and dusted, F1.com has the lowdown on which teams ran and what the drivers said.

The U.S. measles outbreaks. - Tangle News

2026-01-28 17:02 | Source: US National News - Readtangle.com

A closer look at the rise in measles cases across the country.

A Comprehensive Network for the Discovery and Characterization of Interstellar Objects Like… - Avi Loeb – Medium

2026-01-28 16:33 | Source: US National News - Medium

Inspired by the unresolved anomalies displayed by the latest interstellar visitor 3I/ATLAS (as listed here), I co-authored a new paper with…

Portland Fire reveals home and away jerseys for 2026 season - oregonlive.com

2026-01-28 16:11 | Source: US National News - OregonLive

The 2026 Portland Fire jerseys are here.

Fed holds rates steady, signaling risks to economy are dropping - The Washington Post

2026-01-29 02:33 | Source: Business News - The Washington Post

The central bank opted for its first pause after three rate cuts last year, as officials wait for clearer signs that inflation is cooling.

Futures: Meta Jumps, Microsoft Skids; Tesla CapEx To Soar - Investor's Business Daily

2026-01-29 01:41 | Source: Business News - Investor's Business Daily

Fed chief Jerome Powell said rates may be on pause for longer.

Microsoft’s Earnings Surge Is Overshadowed by Data-Center Spending - The Wall Street Journal

2026-01-29 00:28 | Source: Business News - The Wall Street Journal

No content available

Tesla profits slumped 46% last year, as it lost its crown as the top EV seller - NPR

2026-01-29 00:01 | Source: Business News - NPR

The company announced it was ending production of its higher-end Model S and Model Y, and turning that production space over to making humanoid robots.

Amazon announces 16,000 corporate job cuts, shaking Seattle's economy - komonews.com

2026-01-28 23:38 | Source: Business News - KOMO News

Amazon has announced mass layoffs that will affect nearly one in 10 members of its corporate workforce, cutting about 16,000 jobs in a move that is already send

Zuckerberg teases agentic commerce tools and major AI rollout in 2026 - TechCrunch

2026-01-28 23:16 | Source: Business News - TechCrunch

Mark Zuckerberg says 2026 will be "a big year for delivering personal super intelligence."

Meta Platforms Stock Investors Just Got Fantastic News from CEO Mark Zuckerberg - The Motley Fool

2026-01-28 23:11 | Source: Business News - Motley Fool

The social media giant continues to fire on all cylinders.

Carvana Dives After Short Seller Criticizes Ties To Lenders - Investor's Business Daily

2026-01-28 22:23 | Source: Business News - Investor's Business Daily

Carvana stock fell about 20% on Wednesday after a short-seller report alleged the company's earnings depended on shaky loans.

Tesla to invest $2B in Elon Musk’s xAI - TechCrunch

2026-01-28 21:52 | Source: Business News - TechCrunch

Elon Musk's AI company xAI disclosed earlier this month it had raised $20 billion.

Bitcoin price news: BTC stuck at $89,000 as gold surges to fresh record - CoinDesk

2026-01-28 21:39 | Source: Business News - CoinDesk

Gold fans rushed in to buy as the Fed chair said he took no macro signal from the raging bull market in precious metals.

OpenAI Wants To Create Biometric Social Network To Kill X’s Bot Problem - Forbes

2026-01-28 21:05 | Source: Business News - Forbes

OpenAI is quietly building a social network and considering using biometric verification like World’s eyeball scanning orb or Apple’s Face ID to ensure its users are people, not bots.

ASML Stock Surges On Strong Sales Forecast For 2026 - Investor's Business Daily

2026-01-28 21:00 | Source: Business News - Investor's Business Daily

No content available

Home Depot to cut 800 corporate jobs, require workers back to office full time - ajc.com

2026-01-28 20:35 | Source: Business News - Atlanta Journal Constitution

Home Depot says it is eliminating about 800 corporate jobs tied to its Vinings headquarters.

Starbucks scraps $250,000 cap on boss's use of company jet - BBC

2026-01-28 20:16 | Source: Business News - BBC News

The coffee chain changes Brian Niccol's travel budget due to media attention and "credible threat actors".

Bank of America, JPMorgan Chase to contribute $1,000 to Trump Accounts for their employees - CBS News

2026-01-28 19:51 | Source: Business News - CBS News

Two of the biggest U.S. banks said they would match a $1,000 federal contribution for employees who open a Trump Account, touting the plan as a way to save money.

The economy's pressure relief valve: the U.S. Dollar - Axios

2026-01-28 17:43 | Source: Business News - Axios

The U.S. dollar index is down 3.2% since Jan. 16.

Trump proposal signals Medicare austerity - Politico

2026-01-28 15:00 | Source: Business News - Politico

Health insurers who thought Trump would rescue their Medicare businesses got a rude awakening Monday.

Anthropic, Apple, OpenAI CEOs condemn ICE violence, praise Trump - TechCrunch

2026-01-28 14:38 | Source: Business News - TechCrunch

Anthropic's Dario Amodei and OpenAI's Sam Altman spoke out against ICE enforcement tactics following Minneapolis violence, with one addressing concerns publicly and the other in an internal message.

UPS Is Firing Its Biggest Customer -- And Wall Street Finally Understands Why - The Motley Fool

2026-01-28 14:20 | Source: Business News - Motley Fool

UPS is doubling down on reducing its relationship with Amazon.

9front OS

2026-01-30 00:28 | Source: Hacker News

Comments

‘Backseat Software’

2026-01-29T21:50:32Z | Source: Daring Fireball

Mike Swanson: What if your car worked like so many apps? You’re driving somewhere important…maybe running a little bit late. A few minutes into the drive, your car pulls over to the side of the road and asks: “How are you enjoying your drive so far?” Annoyed by the interruption, and even more behind schedule, you dismiss the prompt and merge back into traffic. A minute later it does it again. “Did you know I have a new feature? Tap here to learn more.” It blocks your speedometer with an overlay tutorial about the turn signal. It highlights the wiper controls and refuses to go away until you demonstrate mastery. Ridiculous, of course. And yet, this is how a lot of modern software behaves. Not because it’s broken, but because we’ve normalized an interruption model that would be unacceptable almost anywhere else. I’ve started to think of this as backseat software: the slow shift from software as a tool you operate to software as a channel that operates on you. Once a product learns it can talk back, it’s remarkably hard to keep it quiet. This post is about how we got here. Not overnight, but slowly. One reasonable step at a time. If that lede pulls you in, like it did for me, you’re going to love the rest of the essay. This is one for the ages. It’s so good. ★

Let’s Keep an Eye on Apple’s Own iOS Adoption Numbers

2026-01-29T20:18:48Z | Source: Daring Fireball

When I wrote last week about the false narrative that iOS 26 is seeing bizarrely low adoption rates compared to previous years, I neglected one source: Apple itself. Apple’s Developer site publishes a page with iOS and iPadOS usage for devices that “transacted on the App Store”. The hitch is that they only seem to update those numbers twice a year — once right around now, and once again right before WWDC. As of today, those numbers are still from 4 June 2025. Last year, going from the Internet Archive, the numbers were still from iOS 17 (June 2024) on 23 January last year, but were updated for iOS 18 on 24 January. Here are those iOS 18 numbers from one year ago this week. iPhones released in the previous four years: iOS 18: 76% iOS 17: 19% iOS < 17: 5% All iPhones: iOS 18: 68% iOS 17: 19% iOS < 17: 13% iPads released in the previous four years: iPadOS 18: 63% iPadOS 17: 27% iPadOS < 17: 10% All iPads: iPadOS 18: 53% iPadOS 17: 28% iPadOS < 17: 19% (Apple itself manages to present these statistics without ever using the plurals iPhones or iPads, instead referring only to “devices”.) I presume, or at least hope, that they’ll update these numbers for iOS 26 any day now. ★

Box Office Expectations for ‘Melania’

2026-01-29T15:22:23Z | Source: Daring Fireball

Jeremy Fuster, reporting for TheWrap: But save for some theaters in Republican-heavy states, the film is unlikely to leave much of an impact at a slumping box office, with theatrical sources telling TheWrap that “Melania” is projected for an opening of around $3 million this weekend. That would put it below the last right-wing documentary, the Daily Wire-produced Matt Walsh film “Am I Racist?,” which opened to $4.5 million from 1,517 locations in September 2024, finishing with a $12.3 million total that made it the highest-grossing doc that year. The highest projections are coming from NRG with an estimate of around $5 million, though audience interest polls from the company have 30% saying they are “definitely not” interested in watching the film, an unusually high count for any wide release. These projections are with a $35 million promotional campaign, for a movie Amazon paid $40 million to purchase. (Via Taegan Goddard.) ★

Amazon’s Spending on ‘Melania’ Is a Barely Concealed Bribe

2026-01-29T15:19:17Z | Source: Daring Fireball

Nicole Sperling and Brooks Barnes, reporting for The New York Times: Amazon paid Ms. Trump’s production company $40 million for the rights to “Melania,” about $26 million more than the next closest bidder, Disney. The fee includes a related docuseries that is scheduled to air later this year. The budget for “Melania” is unknown, but documentaries that follow a subject for a limited amount of time usually cost less than $5 million to produce. The $35 million for marketing is 10 times what some other high-profile documentaries have received. All of which has a lot of Hollywood questioning whether Amazon’s push is anything more than the company’s attempt to ingratiate itself with President Trump. This is a good story, with multiple industry sources with experience making political documentaries, but the Times’s own subhead downplays Amazon’s spending on the film: “The tech giant is spending $35 million to promote its film about the first lady, far more than is typical for documentaries.” They’re spending $35 million now, to promote it, but they already paid $40 million for the rights to the film, $28 million of which is believed to have gone to Melania Trump herself. A $35 million total spend would be a lot compared to other high-profile documentaries, but it’s a $75 million total spend. This is not just a little fishy — it’s a veritable open air seafood market. Back to the Times: To grasp just how uncustomary Amazon’s marketing push for “Melania” is, consider how Magnolia Pictures handled “RBG,” a portrait of Ruth Bader Ginsburg during her 25th year as a Supreme Court justice, in 2018. CNN Films produced “RBG” for around $1 million. The promotional budget, including an awards campaign that helped it land two Oscar nominations, totaled about $3 million. The film debuted in 34 theaters and expanded into 432 locations over several weeks. It ultimately collected $14 million, enough to rank as the year’s No. 1 political documentary. And: On Friday, “Melania” will also be released in 1,600 theaters overseas, where FilmNation, a New York company, is handling distribution in more than 20 countries. International ticket sales are expected to be weak, according to box office analysts. Shocker. ★

Kickstarter for Ollie’s Arcade Expansion

2026-01-29T15:01:43Z | Source: Daring Fireball

Ged Maheux, The Iconfactory: This week we announced a new Kickstarter that’s aimed at expanding the game offerings of Ollie’s Arcade, the fun, ad-free retro gaming app we introduced back in 2023. Ollie’s Arcade has always been a great way to escape doomscrolling, even if just for a little while, and now we have an opportunity to bring these retro games to even more people on iOS. The Kickstarter aims to raise enough money to make all of the in-app purchase games in the app completely free for everyone to enjoy. We also want to bring our beloved puzzle game, Frenzic, to life once again. Frenzic was one of the very first games available on iOS back in 2008, then was reborn as Frenzic: Overtime on Apple Arcade. Since it left, people have been asking us for a new version that they can just pick up and play. We couldn’t agree more! I linked to the Kickstarter for the original Ollie’s Arcade project back in 2023, which was a big success. And I first linked to Frenzic all the way back in 2008, when the App Store was only a few months old. It’s just a great concept for a casual game on a small screen, implemented with all of The Iconfactory’s exquisite attention to detail. That’s true for all the games in Ollie’s Arcade, but Frenzic is special. This new Kickstarter for the Ollie’s Arcade expansion has already hit its funding goal, but it’s approaching the stretch goal for an additional game. There are a zillion games for iOS, but it’s sad how few are ad-free and don’t require a subscription. If you think well-crafted fun games that you can pay for once (for a very reasonable price) should be rewarded, you should join me (and others) in backing this Kickstarter. ★

Comparing the Classic and Unified Views in iOS 26’s Phone App

2026-01-29T00:10:30Z | Source: Daring Fireball

Adam Engst, back in November, at TidBITS: Did you know that, regardless of view, you can now swipe left on any call to reveal a blue clock icon that lets you create a reminder to call back in 1 hour, tonight, tomorrow, or at any custom time (below left, slightly doctored)? Reminders appear at the top of the Calls list and in your default Reminders list. You can also touch and hold a call associated with a contact to connect with them in other ways (below right), or touch and hold a call from an unknown caller to add them to Contacts. I did not know this, until I read Engst’s article. One criticism I’ve seen a few times (but to be clear, not from Engst) ever since Apple debuted the new Unified interface for the Phone app back at WWDC, is that it’s somehow wrong that Apple offers it as option alongside the Classic interface. “When does Apple ever offer options like this?” I’d argue that Apple used to offer options like this all the time. The Music app on the original iPhone (which app was actually named “iPod” for a while) let you customize all the tabs at the bottom. All of Apple’s good Mac apps (the AppKit ones, primarily) still let you customize the entire toolbar. The problem isn’t that Apple now offers two very different interfaces for the Phone app. The problem is that Apple stopped offering users ways to significantly tailor apps to their own needs and tastes — and the proof that they stopped is that so many people now think it’s so strange that they’re offering two options for how the Phone app should look and work. Overall, I like the new Unified layout in the Phone app. But what I love is there remains an option for those who don’t, and that you can switch between the two in a very obvious, easily discoverable (dare I say, hard to miss) way right in the app itself. No need to dig two or three levels deep into the Settings app. You can just switch right there in the main screen of the Phone app itself. It’s things like this that keep me optimistic that Apple is still capable of great new work in UI design. ★

Software is mostly all you need

2026-01-29 23:06 | Source: Hacker News

Comments

Grid: Forever free, local-first, browser-based 3D printing/CNC/laser slicer

2026-01-29 22:38 | Source: Hacker News

Comments

Cutting Up Curved Things

2026-01-29 22:34 | Source: Hacker News

Comments

Backseat Software

2026-01-29 22:10 | Source: Hacker News

Comments

The WiFi only works when it's raining (2024)

2026-01-29 20:47 | Source: Hacker News

Comments

The Hallucination Defense

2026-01-29 19:45 | Source: Hacker News

Comments

Gemini CLI gets its hooks into the agentic development loop

2026-01-29 19:42 | Source: The New Stack

Google has added hooks to Gemini CLI, its terminal-based competitor to Anthropic’s Claude Code. Hooks ensure that Gemini CLI runs a given script or program inside of the agentic loop and bring a larger degree of control to the agentic development loop. These could be used, for example, to run security scanners or compliance checks, log tool interactions, inject relevant information into the context window, or even adjust the model’s parameters on the fly. As the Gemini CLI team notes in the announcement, “efficiency in the age of agents isn’t just about writing code faster; it’s about building custom tools that adapt to your specific environment.” Hooks in Gemini CLI (Credit: Google). While a developer could try to instruct the agent to run a specific script at certain times within the loop in the prompt or AGENTS.md file, given the non-deterministic nature of those agent models, there’s no guarantee that this will actually happen or that the agent won’t forget about this instruction over time. Claude Code did it first If this sounds familiar, it’s likely because you already know about Claude Code Hooks, which first introduced this idea last September (though there is also a GitHub issue from July 2025 that proposes this feature). Google’s implementation is not quite a one-to-one match to Anthropic’s, but it should only take a few minutes to adapt an existing Claude hook to Gemini CLI. Setting up hooks Like with hooks in Claude Code, Gemini CLI also implements roughly a dozen lifecycle events where a hook can fire. That may be right at the session start, after the user submits a prompt but before the agent starts planning (to add context, for example), before tools are selected (to optimize the tool selection or filter available tools), and similar moments in the agent loop. Defining a Gemini CLI hook (Credit: Google). The hooks are defined as JSON files that describe when they are invoked and which script they should run. Those scripts are standard Bash scripts and Google notes that it is essential to keep those hooks fast because they do run synchronously and delays in the script will also delay the agent response. Google recommends that developers use parallel operations and caching when possible to keep the operations fast. One interesting use case for hooks is to utilize the ‘AfterAgent’ hook, which fires when the agent loop ends, to force the agent into a continuous loop to work on a difficult task — while also refreshing the context between those runs to avoid context rot. As for security, it’s important to stress that hooks will have the user’s privileges, and Google notes that developers should review the source code of any third-party hooks. Hooks, which are now available as part of the Gemini CLI v0.26.0 update, can also be packaged inside Gemini CLI extensions. That’s Google’s format for packaging prompts, MCP servers, sub-agents, and agent skills — and now hooks — into a single sharable package. The post Gemini CLI gets its hooks into the agentic development loop appeared first on The New Stack.

Flameshot

2026-01-29 19:30 | Source: Hacker News

Comments

Meet Gravitino, a geo-distributed, federated metadata lake

2026-01-29 19:25 | Source: The New Stack

In the new world of agentic AI, the discussion has revolved around data: governance, storage, and compute. But what about metadata — the data about data? Metadata has been a second-class citizen, according to Junping (JP) Du, founder and CEO of Datastrato, a data and AI infrastructure company. AI is changing how data — and metadata — is consumed, understood, and governed, so Datastrato created Apache Gravitino, an open source project that serves as a high-performance, geo-distributed, federated metadata lake. The project is designed to be a single-engine, neutral control plane for metadata and governance, tailored to the needs of multimodal, multi-engine AI workloads. Last year was a big one for Gravitino. In June, it graduated as an Apache Top Level Project. In December, it delivered its first major stable release, version 1.1.0. At the start of 2026, it joined the brand new Agentic AI Foundation. Gravitino, Du says in this episode of The New Stack Makers, is a “catalog of catalogs, because we try to solve the problems of running the data and AI platforms more safely and consistently.” In the age of AI, Du says, “We need more engine-friendly or agent-friendly metadata and try to unify everything together and [provide] the technical metadata to the engine support as a first-class citizen.” Gravitino builds a unified data catalog, regardless of whether the data is traditional, structured, or multi-modal. “We all take [these] kind of data formats, and we allow the multiple engines to access this kind of data, so there’s no data silo anymore,” Du says. “And also it can be easy to consume by AI agents — instead of previously, having to be building everything to be at the data warehouse and consume from there.” Tackling metadata’s governance problem Du — who spent about 15 years building data infrastructure for the Apache Hadoop project — and Jerry (Saisai) Shao, co-founder and CTO of Datastrato, leaned on their long experience in building cloud data warehouses and lake houses in creating Gravitino. As data and AI systems grew in complexity, engineers encountered recurring problems. “The first [problem] is actually data: It’s spread across multiple engines like Spark, Trino, or even some runtimes like Ray, PyTorch. “And another problem is the metadata … It’s a siloed catalog instead of a unified catalog to know everything. So, that means the governance, access controls, and even the semantics are hard to build in efficient ways.” Metadata, Du adds, can be duplicated or inconsistent. AI makes the problem worse, he says, “especially for unstructured data, because it’s hard to manage in a typical way.” In a production environment, especially at enterprise scale, he added, it’s hard to find a single point of truth to define what data exists, how it can be accessed, and how it can be governed. Gravitino was designed to solve those issues. It was built with Java, but supports Python clients. The use cases for Gravitino include multi-cloud data consolidation, Du says. One of Datastrato’s customers is among the largest internet technology companies in the United States. “They have tons of data,” he says, including a lot of abstracted data. “The data is distributed on-prem and to public clouds. So their compute resources, especially a GPU resource, are distributed over, you know, several clouds and regions. They want the same data, right? It’s available for all these kinds of clouds and regions, so then they can trigger the training jobs or inference jobs or their applications anywhere.” Therefore, “A unified data catalog is very critical, right in this case, to make sure all this data is secure and consistent right across all the locations.” Check out the full episode to learn more about Gravitino’s use cases, how it fits into the existing commercial and open source tooling landscape, and why the project’s founders decided to donate it to the Agentic AI Foundation. The post Meet Gravitino, a geo-distributed, federated metadata lake appeared first on The New Stack.

Ramp’s Inspect shows closed-loop AI agents are software’s future

2026-01-29 19:00 | Source: The New Stack

The recent release of the background coding agent Inspect by Ramp’s engineering team serves as a definitive proof point that closed-loop agentic systems are the future of software development. It has transformed coding agents into truly autonomous engineering partners, and it is fundamentally changing the way agents deliver software. Whether teams use a custom cloud development environment (CDE) like Ramp or another approach, the signal is clear: Teams need to solve for this kind of autonomy or risk getting left behind. Modern engineers need access to coding agents that do not just generate code but also run it, verify the output, and iterate on the solution until it works. This distinction represents a fundamental shift. The industry has been focused on optimizing the “brain” of agents, solving for context windows and reasoning. Ramp’s success validates that the “body” matters just as much. The ability to interact with a runtime environment is what transforms code from a hypothesis into a solution. This verification loop separates truly autonomous coding agents from those that rely on humans to validate their work. The open-loop bottleneck Modern coding agents are impressive. They can plan complex refactors and generate thousands of lines of code. However, these agents typically operate in an open loop. They rely on the developer to act as the runtime environment. The agent proposes a solution. The human must compile, test, and interpret error messages or feed them back to the agent. The cognitive load of verification remains with the user. This workflow caps developer velocity. The speed of the agent is irrelevant if the verification process is slow. We have optimized code generation to be near instantaneous, but verification remains bound by human bandwidth and linear CI pipelines. Inspect demonstrates that closing that loop unlocks a new category of velocity. By giving the agent access to a sandbox to run builds and tests, the agent transitions from text generator to task completer. It hands off a verified solution rather than a draft. The impact is measurable. Ramp reported vertical internal adoption charts. Within months, approximately 30% of all pull requests merged to its frontend and backend repositories were written by Inspect. This penetration suggests closed-loop agents are a step function change in productivity, not a marginal improvement. The economics of curiosity The value proposition of closed-loop agents is not just delivering code faster. It is about the parallelization of solution discovery. In traditional workflows, exploring refactors or library upgrades is expensive. It requires context switching, stashing work and fighting dependency conflicts. Because experimentation costs are high, we experiment less. We stick to safe patterns to avoid the time sink of failure. Background agents change the economics of curiosity. If an engineer can spin up 10 concurrent agent sessions to explore 10 architectural approaches, the cost of failure drops significantly. Consider a team migrating a legacy component. Currently, this is a multiweek spike. In the new paradigm, a developer could instead task a fleet of agents to attempt the migration using different strategies. One agent might try a strangler fig pattern. Another might attempt a hard cutover. A third might focus on integration tests. The developer then reviews results rather than typing code. The agents run in isolated sandboxes. They build, catch syntax errors, and run test suites until they achieve a green state. The developer wakes up to three potential pull requests verified against the CI pipeline and chooses the best one. Verification beyond localhost Ramp’s Inspect platform validates within a custom-built CDE. To ensure these environments start quickly despite their complexity, a sophisticated snapshotting system keeps images warm and ready to launch. Ramp was able to extend this CDE infrastructure to also support integration testing, a brilliant engineering feat that works well for its specific context. However, for many organizations building complex, cloud native applications with high levels of dependencies, this approach faces significant hurdles. Often, the entire stack is too large to be spun up on a single virtual machine (VM) or devpod. In these scenarios, while CDEs remain excellent for replacing local development laptops, high-fidelity integration testing requires a different approach. To enable true autonomy in these complex environments, we need a way to perform integration testing without replicating the entire world. We can connect agents directly to a shared baseline environment using existing Kubernetes infrastructure. In this model, the agent deploys only the modified service to a lightweight sandbox. The infrastructure uses dynamic routing and context propagation to direct specific test traffic to that sandbox while fulfilling all other dependencies from a shared, stable baseline. This approach gives coding agents the power to execute autonomous end-to-end testing, regardless of the stack’s size or complexity. It leverages the existing cluster to provide high-fidelity context. An agent can then run integration tests against real upstream and downstream services. It sees how the change interacts with the actual message queue schema and the latency of the live database. This closes the loop with higher fidelity while lowering the infrastructure barrier. By testing against a shared cluster, the agent can catch integration regressions that might pass in a hermetic VM without requiring the platform team to build a custom orchestration engine to support it. The future of software delivery The release of Inspect is a clear signal of where software development is heading. The era of the human engineer as the sole verifier is ending. We are moving toward a world where agents operate as autonomous partners capable of exploring solutions and verifying their own work. Ramp has proven that this workflow is not science fiction. It is working in production today and is driving massive efficiency gains. The question for the rest of the industry is not whether to adopt this workflow, but how. Whether a team chooses to build a custom platform like Ramp or adopt an existing cloud native solution like Signadot to give their agents a runtime, the imperative is the same. We must provide our agents with a body. We must close the loop between generation and verification. Once we do, we unlock a level of velocity that will define the next generation of high-performing engineering teams. The post Ramp’s Inspect shows closed-loop AI agents are software’s future appeared first on The New Stack.

PlayStation 2 Recompilation Project Is Absolutely Incredible

2026-01-29 18:55 | Source: Hacker News

Comments

County pays $600k to pentesters it arrested for assessing courthouse security

2026-01-29 18:48 | Source: Hacker News

Comments

My Mom and Dr. DeepSeek (2025)

2026-01-29 18:45 | Source: Hacker News

Comments

Prompting vs. RAG vs. fine-tuning: Why it’s not a ladder

2026-01-29 18:00 | Source: The New Stack

Teams usually assume there’s a straightforward progression from prompt engineering through retrieval-augmented generation (RAG) to fine-tuning (the last rung on the ladder) when customizing large language models (LLMs). This is an easy-to-understand, frequently repeated narrative that is true for some developers but not for all teams working with LLMs in production environments. Prompt engineering, RAG, and fine-tuning are not sequential upgrades in real-world enterprise systems. Instead, they represent different architectural methods for addressing different types of problems and introduce their own limitations and failure modes. Viewing them as a linear progression creates a false narrative that can lead to brittle systems that cannot adapt to changing requirements. To assess the success or failure of LLM architectures in production environments, a six-dimensional framework outlines the actual constraints that affect whether an LLM system will function well or poorly in production: data privacy, latency, degree of control, update frequency, deployment target, and cost. When LLM architecture decisions are judged Most architectural decisions regarding LLMs are made based on assumptions rather than evaluation. It is typically only after releasing the LLM applications that teams realize that the architecture is failing to meet its intended goals. At this time, teams may face difficult questions about the performance of their released LLM: “Why is our response time inconsistent?” “Why did our costs go up this week?” “How is sensitive information showing up in our logs?” After a poorly performing architecture choice is identified, teams often use weak excuses to justify it, such as “We chose the most advanced architecture available,” or “We are doing things the way everyone else is.” These excuses do not provide sufficient detail to help a team understand why the architecture failed to meet expectations. A good architecture makes its trade-offs visible. A good architecture allows a team to articulate why a particular approach was selected, what benefits it provides, and the potential trade-offs. As a result, teams need to make informed decisions about which approach to select for their specific environment. The problem with the linear ladder model for LLMs Each of the three major approaches to customizing large language models — prompt engineering, RAG, and fine-tuning — provides a different set of capabilities and/or constraints. Each is a structural decision that will have significant implications for how the team will interact with the LLM going forward. Many teams receive recommendations on building their LLM systems that are based on a ladder model: Start with prompt engineering; if that doesn’t work, move on to RAG; if RAG doesn’t work, move on to fine-tuning. The ladder model is attractive because it is easy to understand, offers direction and purpose for teams, and conveys a sense of progress. However, the ladder model fails to account for the reality that teams are not judged on the sophistication of their architectures; instead, they are judged on whether their architectures violate the constraints of their environment. Teams are expected to meet performance, security, and reliability standards. If a team’s architecture prevents its LLM system from meeting these standards, it does not matter whether the team used the “latest and greatest” approach to building its application. Many of the failures associated with LLMs occur because the architecture does not align with the problem domain’s needs. Examples of architectural failures include: Teams experiencing high response latency and unpredictable tail times Teams experiencing rapidly increasing operational costs Teams experiencing data privacy violations and sensitive information risks Teams with systems that are difficult to update without experiencing regression. None of these failures can be addressed by moving to the next rung on the ladder. In fact, many of these failures occur specifically because a team followed the ladder without accounting for the constraints of their environment. 6 dimensions that matter in production Production success is defined by multiple independent limits rather than a single “quality” limit. The six dimensions listed below generally define which architectures are viable. There is no hierarchy of these dimensions. Generally, improving one dimension will degrade another. As there is no universally best configuration of these dimensions, there is only an intentional trade-off based on the system’s needs. These six dimensions — data privacy, latency, degree of control, update frequency, deployment target, and cost — serve as constraints on the development of LLM architectures. The following figure illustrates how these dimensions may interact without falling into a “linear ladder” trap. The figure groups these dimensions into the initial feasibility gate (non-negotiable barrier), the optimization dimension (tunable trade-off), and resultant architecture-building block combinations that may be used as hybrid models. Data privacy: The first feasibility barrier Data privacy is often the first serious constraint production teams encounter and it’s generally non-negotiable. The question is not whether the model vendor is “secure.” The question is whether sensitive data can ever leave the organization’s boundaries. Generally, prompt engineering sends the entire prompt, including user input, contextual information, etc., to an external inference provider. Even fine-tuning can create more privacy risk since the training data or derived gradients need to be sent to a tuning pipeline, thus providing longer-lived access than a single inference call. RAG alters the privacy surface by enabling sensitive data to remain within internal systems, while only its fragments are sent to the model. In practice, data privacy is determined by data classification. If an application handles regulated data (such as personal health information or confidential data), many architectures may quickly become infeasible unless the model is self-hosted or hosted in a controlled environment. On the other hand, if the application is public-facing and does not handle sensitive data, external APIs may be acceptable. The key takeaway is that data privacy is a barrier, not a tunable parameter. Once data privacy is identified as a barrier to using an external inference service, the entire architecture collapses. Latency: The constraint users notice first Once the data privacy constraint is addressed, latency becomes the constraint users notice. Users will perceive the system as unreliable if latency is excessive or unpredictable. The primary difference in latency among models is due to the number of architectural stages in the request path, rather than the model’s intelligence. For example, prompt engineering typically has the lowest latency since the request is only a single inference call. In addition, RAG introduces multiple stages (embedding search, retrieval, reranking, and chunk selection) that increase latency and can also generate high tail times under load. Fine-tuning typically yields fast inference paths by eliminating the need for retrieval and embedding, and by integrating them directly into the model. Using the fastest architecture as the sole basis for selecting an architecture is a mistaken approach. More often than not, the correct design is a hybrid approach. An example of this is using a low-latency routing mechanism — a small, tuned model identifies the user’s intent, classifies the query, and then fires off a higher-latency RAG pipeline only when knowledge grounding is necessary. That type of hybrid architecture protects the user experience while enabling high-precision answers when needed. In production, latency is rarely just about average response time, but rather an issue of predictably low tail latency under concurrent workloads. Degree of control: Constraining behavior and knowledge Responding quickly is irrelevant if system behavior is unstable. Degree of control, the third dimension, refers to how reliably architects can constrain the model’s behavior, outputs, and knowledge boundaries. Prompt engineering constrains the model’s behavior primarily at the output layer. While prompt engineering can constrain the structure (such as JSON schema), formatting, and localizable behavior of the output, prompt-based control is fragile because it competes with model priors, user messages, and long-term context effects. RAG constrains the model’s knowledge boundaries at the level of knowledge boundaries. RAG is not primarily used to make the model smarter. Rather, RAG is used to constrain what the model is allowed to know in a particular request. Therefore, RAG is particularly useful in regulatory environments, where it provides a transparent, governable knowledge path. Fine-tuning constrains the model’s behavior to provide consistent behavior for each request. Fine-tuning defines the tone, style, reasoning patterns, classification thresholds, and domain-specific preferences that the model uses to respond to each request. It is most valuable when the desired behavior is stable and should be baked into the model, rather than being inserted at runtime. Here again, degree of control is not one thing. Degree of control can mean: Controlling output structure Controlling knowledge sources Controlling behavioral consistency Each of these techniques constrains a different layer, and that determines what types of failures can occur. Update frequency: The cost of keeping your system current Control generally makes things rigid, and over time, the dominant cost of an architecture is not deployment, but updating it. Update frequency describes how often a system has to add new information or modify previously acceptable behavior. Prompt engineering is useful for rapid updates because modifying a prompt is simple. But as prompts expand, maintaining them becomes hard, and versioning becomes a nightmare, along with the issues that arise when prompts interact with each other. RAG is useful for quick, scalable updates because the knowledge base can change independently of the model. If your domain changes every week — such as policy changes, new product documentation, new HR procedures — RAG provides a clear mechanism to update the corpus rather than the model. Fine-tuning is slow and costly to update because it involves training and validation cycles. Fine-tuning is worthwhile only when you have stable, highly valuable behavior. When you need to frequently change the underlying knowledge, fine-tuning will be a hindrance. This is why you should follow this general rule: Keep all knowledge that changes over time outside your model. Use tuning for stable behavioral patterns; use retrieval for dynamic knowledge. Deployment target: Where the model runs Even though an architecture appears flawless on paper, deployment constraints can prevent implementation. Cloud API deployments can maximize speed to market. However, these deployments are subject to limitations related to privacy, regulatory compliance, and network latency. Deployments within the virtual private cloud/on-premises environment enable data sovereignty and internal controls, but add significant complexity to both infrastructure and operations. Edge deployments often limit model size and direct development teams toward either small, tuned models or specialized inference runtimes. Where the workload is to be deployed can limit feasibility. For example, if an organization has a data sovereignty requirement and does not permit external inference, prompt engineering via public APIs is no longer an option. For such organizations, self-hosted RAG or tuning would likely be the default, regardless of the position of either approach on the ladder. Cost: What eliminates ‘successful’ pilot projects Most LLM projects do not fail during the prototype phase. Most LLM projects fail after successful adoption when traffic grows and costs become non-linear. Cost is not just “what does the model cost per token?” It can be influenced by: The length of the prompt and the retrieved context The class/model used and the pricing of the model provider Concurrency/scaling strategy Caching efficiency GPU/CPU resource utilization for self-hosted deployments The engineering overhead required for maintaining retrieval pipelines While prompt engineering is often the least expensive initial approach, its cost can become unpredictable as the prompt and context sizes grow. RAG increases operational cost because the retrieval pipeline must always be running — vector databases, indexing jobs, and the reranker — but it can also decrease inference cost by enabling the use of smaller models and reducing the amount of work the LLM must do to fill in hallucinations. Fine-tuning has very high up-front costs (training and evaluation), but it can also reduce inference costs and latency by eliminating the need to retrieve content or reducing the number of tokens required in the prompt. The major difference here is the predictability of cost. The most dangerous systems are those that incur increased cost in proportion to their usage, such as those that frequently include large retrieved contexts or multistep LLM calls without strict budgets. In production, cost should be considered as an architectural dimension from Day 1, not a billing shock discovered after launch. Putting the dimensions together: A decision framework There isn’t a single “right” solution for all applications. The appropriate architecture will depend on the relative importance of the six dimensions. You can use the six dimensions in a particular sequence when determining which method(s) best fit your application: Data privacy: Are you allowing sensitive information to cross the application boundary? If not, you need to eliminate any external API calls. Deployment target: Will your application run on your required platform? Eliminate any methods that don’t support your application’s deployment target. Latency: Can your architecture deliver the necessary low latency during periods of high load? Can your architecture meet the performance expectations of your users or customers during high-load situations? Cost: Will your architecture be economically viable under high production traffic loads? Will your architecture remain economically viable as request volume increases? Update frequency: How difficult is it to adapt your architecture as customer expectations evolve over time? How costly will changing your architecture be when changes occur due to evolving customer requirements? Degree of control: To what extent do you want to control potential failure points of your architecture to minimize downtime and the associated lost revenue? Once you have determined the relative importance of each dimension, you should create an architecture composed of multiple mechanisms that work together, rather than simply choosing a single “best” mechanism. In real-world enterprise applications, many successful architectures are hybrids: Fine-tuning (or lightweight adapters) establishes stable behavior patterns. RAG provides governable and regularly updated knowledge. Prompt engineering enforces structured output and task-level runtime constraints. The six-dimensional framework explains why “prompt vs. RAG vs. fine-tuning” is the wrong question. Instead, ask yourself: Which mechanisms should I include in my architecture based upon the constraints identified above? To make this decision-making process more tangible, the following flowchart outlines a practical approach to evaluating your LLM architecture across the six dimensions. Conclusion The “ladder” of developing a production LLM system — prompt engineering → RAG → fine-tuning — may appear attractive as it greatly simplifies the process of making decisions about how to develop your architecture. However, the reality is that production LLM systems are not developed by being sophisticated. Production LLM systems are developed by being constrained. A six-dimensional framework helps identify the trade-offs involved in developing an architecture and ensures that development teams do not treat technology selection as a matter of ideology. When developing an architecture, teams can use the six dimensions to determine which mechanisms to incorporate and design hybrid systems that will withstand real user interactions. Do not try to pursue the most advanced technique. Pursue building an application that is both safe and economically viable. The post Prompting vs. RAG vs. fine-tuning: Why it’s not a ladder appeared first on The New Stack.

Project Genie: Experimenting with infinite, interactive worlds

2026-01-29 17:02 | Source: Hacker News

Comments

Reflex (YC W23) Senior Software Engineer Infra

2026-01-29 17:00 | Source: Hacker News

Comments

Launch HN: AgentMail (YC S25) – An API that gives agents their own email inboxes

2026-01-29 16:42 | Source: Hacker News

Comments

Drug trio found to block tumour resistance in pancreatic cancer in mouse models

2026-01-29 16:11 | Source: Hacker News

Comments

Is the RAM shortage killing small VPS hosts?

2026-01-29 15:42 | Source: Hacker News

Comments

Deep dive into Turso, the “SQLite rewrite in Rust”

2026-01-29 14:51 | Source: Hacker News

Comments

How to choose colors for your CLI applications (2023)

2026-01-29 14:49 | Source: Hacker News

Comments

Moltworker: a self-hosted personal AI agent, minus the minis

2026-01-29 14:43 | Source: Hacker News

Comments

Waymo robotaxi hits a child near an elementary school in Santa Monica

2026-01-29 14:08 | Source: Hacker News

Comments

Claude Code daily benchmarks for degradation tracking

2026-01-29 13:59 | Source: Hacker News

Comments

A lot of population numbers are fake

2026-01-29 13:36 | Source: Hacker News

Comments