좋은 아침이에요, Tony

5월 26일 화요일·07:26 KST·JARVIS ONLINE
AI 뉴스 · 실시간오늘 189건 · 안 본 189 · HN 25 / RSS 50 / Reddit 40 / arxiv 30 / dev.to 20 / YT 24
JARVIS DAILY DIGEST2026.05.26
뉴스 새로고침 후 요약 생성을 누르세요. 음성 출력이 켜져있으면 자비스가 읽어드립니다.
AI 뉴스HACKER NEWS · AI
HN4229 · 1668

Don't post generated/AI-edited comments. HN is for conversation between humans

읽기
HN2544 · 296

Airfoil

읽기
HN2360 · 887

Open source AI is the path forward

읽기
HN2356 · 2826

My AI skeptic friends are all nuts

읽기
HN2346 · 951

An AI agent published a hit piece on me

읽기
HN2135 · 1602

Gemini AI

읽기
HN2104 · 1269

I believe there are entire companies right now under AI psychosis

읽기
HN2084 · 997

IDF killed Gaza aid workers at point blank range in 2025 massacre: Report

읽기
HN2004 · 440

Bypassing airport security via SQL injection

읽기
HN2001 · 478

Air Con: $1697 for an on/off switch

읽기
HN1903 · 749

Local AI needs to be the norm

읽기
HN1875 · 750

Google Duplex: An AI System for Accomplishing Real World Tasks Over the Phone

읽기
HN1755 · 1143

Google Chrome silently installs a 4 GB AI model on your device without consent

읽기
HN1753 · 206

Paper Airplane Designs

읽기
HN1697 · 744

AWS CEO says using AI to replace junior staff is 'Dumbest thing I've ever heard'

읽기
HN1683 · 2991

The young, inexperienced engineers aiding DOGE

읽기
HN1581 · 511

Zoom terms now allow training AI on user content with no opt out

읽기
HN1541 · 836

Project Glasswing: Securing critical software for the AI era

읽기
HN1539 · 197

Show HN: I'm an airline pilot – I built interactive graphs/globes of my flights

읽기
HN1535 · 304

Show HN: Airmash – Multiplayer Missile Warfare HTML5 Game

읽기
HN1520 · 353

Can I run AI locally?

읽기
HN1481 · 783

Andrej Karpathy: Software in the era of AI [video]

읽기
HN1474 · 705

Contra Wirecutter on the IKEA air purifier

읽기
HN1448 · 809

Meta’s AI smart glasses and data privacy concerns

읽기
HN1424 · 462

AirPods libreated from Apple's ecosystem

읽기
공식 블로그 · 테크 미디어OPENAI · ANTHROPIC · GOOGLE · TC · VERGE …
TechCrunch AI15시간 전

What ClickUp’s mass layoff tells us about the future of work

The nine-year-old startup is replacing hundreds of employees with thousands of AI agents.

읽기
TechCrunch AI16시간 전

The pope’s AI encyclical isn’t really about AI

Pope Leo XIV's first encyclical uses AI as a lens to diagnose older problems: concentrated power, eroding democracy, and a tech elite that shapes the world to its own advantage.

읽기
The Verge AI16시간 전

Pope Leo calls for being ‘profoundly human’ in the age of AI

Pope Leo XIV warned of the risks of AI and unconstrained technological power in his first major papal document released on Monday. Magnifica Humanitas is the pope's manifesto on "safeguarding the human person in the time of artificial intelligence," in which he discusses the dangers of AI-powered warfare, the effects o

읽기
TechCrunch AI17시간 전

Startup Battlefield 200 applications close in days: Apply before May 27

The deadline to apply or nominate for Startup Battlefield 200 is May 27. This is your shot at VC access, global visibility, TechCrunch coverage, and $100,000. Apply now.

읽기
TechCrunch AI17시간 전

5 days left: Save up to $410 on TechCrunch Disrupt 2026 passes before prices increase

Early Bird savings for TechCrunch Disrupt 2026 in San Francisco end May 29 at 11:59 p.m. PT. Register now to save up to $410 before prices increase.

읽기
Hugging Face1일 전

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

읽기
OpenAI1일 전

OpenAI, Grupo Folha and Grupo UOL announce strategic content partnership

OpenAI partners with Grupo Folha and Grupo UOL to bring trusted Brazilian journalism to ChatGPT, expanding access to news with attribution and transparency.

읽기
TechCrunch AI1일 전

Everyone is navigating AI security in real time — even Google

We're in the transition period -- all of us.

읽기
TechCrunch AI2일 전

I tried Amazon’s Bee wearable and am both intrigued and slightly creeped out

Like other AI wearables, Amazon's Bee offers an odd combination of convenience and privacy anxiety.

읽기
The Verge AI2일 전

Hackers are learning to exploit chatbot ‘personalities’

This is The Stepback, a weekly newsletter breaking down one essential story from the tech world. For more on AI mischief, follow Robert Hart. The Stepback arrives in our subscribers' inboxes at 8AM ET. Opt in for The Stepback here. How it started Hacking the first generation of AI chatbots was a laughably simple affair

읽기
TechCrunch AI3일 전

Ferrari is using IBM’s AI to create F1 superfans

IBM and Scuderia Ferrari HP take TechCrunch inside how they are redefining the fan experience.

읽기
TechCrunch AI3일 전

Elon Musk has given up on solar power (on Earth)

Elon Muks's xAI has gone all in on natural gas, while SpaceX is obsessed with orbital data centers. What happened to the "solar-electric economy" he promised?

읽기
The Verge AI3일 전

Google’s new anything-to-anything AI model is wild

Last year I deepfaked my kid's stuffed animal to make it look like his plush deer was on vacation. It was an experiment to see if I could re-create the events depicted in a Gemini ad Google was running, and I never showed the videos of Buddy the deer on his adventures to my four-year-old. […]

읽기
Hugging Face3일 전

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

읽기
The Verge AI3일 전

Google’s AI search is so broken it can ‘disregard’ what you’re looking for

Google's AI Overviews are running into an interesting problem right now. Earlier on Friday, if you searched for the term "disregard," the AI Overview section would include a response like what you'd see from a more traditional AI chatbot instead of the typical AI summary, as spotted on X. As you can see in the […

읽기
Google AI4일 전

Catch up on the Dialogues stage at Google I/O 2026.

A recap of the 2026 I/O Dialogues, where leaders discuss the future of AI, quantum computing, robotics and creativity.

읽기
The Verge AI4일 전

Elon, stop trying to make Grok happen

There is a harsh truth about Elon Musk's "truth-seeking" AI chatbot Grok: It's not very good, and not many people are using it. That's the takeaway of a new Reuters report, which found that Grok barely appears in federal records of how the US government used AI last year. It's not the only sign xAI's […]

읽기
Hugging Face4일 전

Specialization Beats Scale: A Strategic Variable Most AI Procurement Decisions Overlook

읽기
The Verge AI4일 전

The literary world isn’t prepared for AI

Since 2012, the British literary magazine Granta has published the regional winners of the annual Commonwealth Short Story Prize. This year, however, there was something off about one of the selections for the prestigious award: It appears to have been written by AI. Jamir Nazir's "The Serpent in the Grove" has many of

읽기
The Verge AI4일 전

Spotify says its AI remix tool is for superfans, but I’m not convinced

AI covers and remixes of songs are already a blight on the internet. Spotify, YouTube, TikTok, and Instagram are awash in flat reggae versions of "Smells Like Teen Spirit," dinky country renditions of The Weeknd, and monotonous Motown reimaginings of AC/DC. Now, a new tool from Spotify will make them even easier to gen

읽기
The Verge AI4일 전

Samsung’s memory chip employees negotiated $340,000 bonuses this year

Details have emerged about a tentative deal struck between Samsung and semiconductor employees who had threatened to strike. The deal reportedly makes some workers eligible for average annual bonuses of $340,000. The proposed 18-day strike had hinged on Samsung's bonus cap for employees in the semiconductor division an

읽기
MIT Tech Review AI4일 전

Google I/O showed how the path for AI-driven science is shifting

During Tuesday’s Google I/O keynote, Demis Hassabis, the CEO of Google DeepMind, proclaimed that we are currently “standing in the foothills of the singularity.” It was a striking statement—the singularity is the theoretical future moment when AI rapidly exceeds human intelligence and dramatically transforms the world.

읽기
OpenAI4일 전

How Virgin Atlantic ships faster with Codex

How Virgin Atlantic used Codex to ship its revamped mobile app on a fixed holiday travel deadline, reaching near-total unit test coverage and zero P1 defects.

읽기
OpenAI4일 전

OpenAI named a Leader in enterprise coding agents by Gartner

OpenAI is named a leader in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, with Codex recognized for innovation and enterprise-scale deployment.

읽기
MIT Tech Review AI4일 전

Roundtables: Can AI Learn to Understand the World?

Listen to the session or watch below AI companies want to build systems that understand the external world and overcome the limitations of LLMs. Recent developments have brought world models to the forefront of the AI discussion. Watch a conversation with editor in chief Mat Honan, senior AI editor Will Douglas Heaven,

읽기
Google DeepMind5일 전

We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

읽기
MIT Tech Review AI5일 전

Scaling creativity in the age of AI

Storytelling is core to humanity’s DNA, stemming from our impulse to express ideals, warnings, hopes, and experiences. Technology has always been woven through the medium and the distribution: from early humans’ innovation of natural pigments and charcoals for cave paintings to literal representation by the

읽기
MIT Tech Review AI5일 전

Anthropic’s Code with Claude showed off coding’s future—whether you like it or not

The vibes were strong at Code with Claude, Anthropic’s two-day event for software developers in London that kicked off on May 19, the same day as Google’s I/O in Palo Alto. (A coincidence, not a flex, Anthropic staffers assured me.) “Who here has shipped a pull request in the last week that was completely written&#8230

읽기
OpenAI5일 전

AdventHealth advances whole-person care with OpenAI

AdventHealth is using ChatGPT for Healthcare to streamline workflows, reduce administrative burden, and return more time to patient care.

읽기
Google AI5일 전

We’re announcing new community investments in Missouri.

We’re helping build the state’s next-generation workforce and investing in energy programs.

읽기
Google AI6일 전

100 things we announced at I/O 2026

This year at Google I/O 2026, we announced Gemini Omni, Google Antigravity, Universal Cart and so much more. Here are the highlights.

읽기
Google AI6일 전

A new experiment brings better group meetings to Google Beam

See and hear your colleagues in true-to-life size and sound, making hybrid meetings feel more inclusive and connected.

읽기
OpenAI6일 전

An OpenAI model has disproved a central conjecture in discrete geometry

An OpenAI model solved the 80-year-old unit distance problem, disproving a major conjecture in discrete geometry and marking a milestone in AI-driven mathematics.

읽기
OpenAI6일 전

How Ramp engineers accelerate code review with Codex

How Ramp engineers use Codex with GPT-5.5 to review code and ship improvements, allowing them to get substantive feedback in minutes instead of hours.

읽기
OpenAI6일 전

The next phase of OpenAI’s Education for Countries

OpenAI advances Education for Countries, expanding AI adoption in schools with new partnerships, teacher training, and tools to improve global learning outcomes.

읽기
OpenAI6일 전

Introducing OpenAI for Singapore

OpenAI for Singapore launches a multi-year AI partnership to expand deployment, build local talent, and support businesses and public services with AI.

읽기
MIT Tech Review AI6일 전

Roundtables: Inside the Musk v. Altman Trial

Listen to the session or watch below Elon Musk lost his suit against OpenAI, in which he alleged CEO Sam Altman and President Greg Brockman had deceived him over the company’s non-profit status. Watch as AI reporter and attorney Michelle Kim, who covered the trial for MIT Technology Review, joins in conversation with e

읽기
Hugging Face7일 전

OlmoEarth v1.1: A more efficient family of Earth observation models

읽기
VentureBeat AI7일 전

Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.

For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list of blue links. On Tuesday, Google will formally retire that paradigm.At its annual I/O developer conference, Google announced a sweeping re

읽기
Google AI7일 전

I/O 2026

At Google I/O 2026, we shared how we’re making AI more helpful for everyone. See everything we announced.

읽기
Google AI7일 전

How AI Mode is changing the way people search in the U.S.

One year after launch, see how AI Mode’s users are shifting from keywords to natural language queries.

읽기
Google AI7일 전

New ways to create and get things done in Google Workspace

Announcing new voice capabilities in Gmail, Docs and Keep, a new design tool called Google Pics and updates to AI Inbox.

읽기
Google AI7일 전

I/O 2026: Welcome to the agentic Gemini era

The latest from Google I/O: See how we’re helping you get more done with Gemini.

읽기
MIT Tech Review AI7일 전

Here’s why Elon Musk lost his suit against OpenAI

On Monday, the jury in Musk v. Altman dealt Elon Musk a major blow—reaching a unanimous advisory verdict that he had sued OpenAI too late and, as a result, his claims are barred by the applicable statutes of limitations. US District Judge Yvonne Gonzalez Rogers immediately accepted it.  Musk announced on X that he

읽기
Hugging Face7일 전

Introducing the Ettin Reranker Family

읽기
Google DeepMind8일 전

Fast-tracking genetic leads to reverse cellular aging

Biologists use Co-Scientist to find novel factors that successfully rejuvenate human cells.

읽기
MIT Tech Review AI8일 전

What to expect from Google this week

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. When Google opens its doors tomorrow for its annual developer conference, I/O, it will do so as a clear third place in the foundation model race. A year ago, at Google I/O&#8230

읽기
MIT Tech Review AI8일 전

Inside Anduril and Meta’s quest to make smart glasses for warfare

The defense-tech company Anduril has shared new details about the augmented-reality headset for the military it’s prototyping with Meta, including a vision for ordering drone strikes via eye-tracking and voice commands. Quay Barnett, who leads the efforts as a vice president at Anduril following a career in the Army’s

읽기
Hugging Face8일 전

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

읽기
Hugging Face8일 전

The Open Agent Leaderboard

읽기
Reddit AI 커뮤니티r/MachineLearning · r/LocalLLaMA · r/singularity …
r/OpenAI방금

Al-Qaeda used ChatGPT to plan Delhi blast, asking questions like 'how to make a rocket and what should be the ratio of the mixture'

↑3 · 0댓

읽기
r/LocalLLaMA방금

Qwen3.5 35B A3B uncensored heretic Native MTP Preserved is Out Now With the Full 785 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats

Safetensors, llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved: [https://huggingface.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved](https://huggingface.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved) GGUFs, llmfa

↑11 · 2댓

읽기
r/LocalLLaMA방금

CXMT started selling ram to corsair

They started producing cheaper ram for corsair, hopefully it will get cheaper for consumers [https://www.tomshardware.com/pc-components/ddr5/chinese-memory-maker-cxmt-enters-the-mainstream-consumer-memory-with-corsair-vengeance-ddr5-kit-chinese-made-dram-emerges-as-an-anti

↑6 · 0댓

읽기
r/OpenAI1시간 전

No more file upload limits on AI models!

Tired of constantly hitting ChatGPT upload limits or splitting huge docs/code into 10 parts? I built DocShareAI for exactly that. Upload or paste anything, get one AI-readable link back, and send it to ChatGPT, Gemini, Grok, etc. No more broken formatting, chunking logs manual

↑2 · 2댓

읽기
r/OpenAI1시간 전

AI doomsday: Hollywood vs. The real threat

↑37 · 6댓

읽기
r/LocalLLaMA2시간 전

One letter to appease them all

↑26 · 18댓

읽기
r/MachineLearning3시간 전

Already 11 000 submissions for EMNLP? [D]

Is this normal? I searched it up and last year it was only 8000.

↑13 · 9댓

읽기
r/singularity3시간 전

Scientists trained an AI model using an IBM quantum computer — and it answered questions correctly that the base model couldn't

↑90 · 21댓

읽기
r/singularity4시간 전

Why do some/most people think AI will never be good enough? What are their arguments?

It might never reach the heights of Einstein, but it’s far superior to the average Joe, even in areas he is completely unfamiliar with. And it’s usually the Joes who think AI won’t be good enough. (No offence)

↑14 · 77댓

읽기
r/ClaudeAI4시간 전

😢😢

↑102 · 8댓

읽기
r/singularity5시간 전

Users who rage quit my software

I make mods for a game called Rimworld. They are pretty popular (together about 2M subs on Steam). Recently I found that there are users in the official Rimworld discord that simply uninstall all my mods as soon as they hear that I updated them with AI. This has nothing to do w

↑204 · 249댓

읽기
r/singularity7시간 전

What’s this sub’s take on the Vatican response to AI?

I’m curious: What do you guys think of Pope Leo XIV's encyclical "Magnifica humanitas"?

↑32 · 51댓

읽기
r/ClaudeAI7시간 전

Weird Injection Prompt In Chat??

Claude inserted an injection prompt at the end of its message out of the blue, and i have repeatedly asked where it got it from or why it inserted this message, but Claude keeps denying it ever did it, no matter how many screenshots or replies i use or whatever i do, Claude just

↑363 · 51댓

읽기
r/LocalLLaMA7시간 전

New local model reaching near frontier on PII removal at 9 ms CPU inference

Hi all, I've been working on this model to strip sensitive information from computer use data and would love some feedback!

↑14 · 8댓

읽기
r/ClaudeAI8시간 전

Why does my Claude Code go crazy like this sometimes?

↑63 · 29댓

읽기
r/MachineLearning9시간 전

Aiki my local Wikipedia Retrieval-Augmented Generation system [R]

# Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. https://i.redd.it/67mzfsrc6f3h1.gif **what it does:** * Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also the option of downloading the simi

↑5 · 0댓

읽기
r/LocalLLaMA10시간 전

Update on 12x32gb sxm v100 cluster / local AI for legal drafting

Update from the lawyer with the V100 server. A few of you asked what I actually ended up running once the dust settled, so here it is. Still just a lawyer, still driving the whole thing through Claude Code, still not fully sure what I'm doing — but it works now, which is more tha

↑188 · 67댓

읽기
r/ClaudeAI10시간 전

How does life find its way back into this subreddit?

As AI assistance has made us more productive, I feel more disconnected. People come here to pump their projects, ask questions they could simply google, complain about the same thing 10 other people did on the same day, post LLM generated walls of text, and more. More posts tha

↑37 · 37댓

읽기
r/OpenAI10시간 전

Got my first ad today!

Targeted ads boutta go crazy...

↑14 · 2댓

읽기
r/ClaudeAI10시간 전

Does the “Indexing” status ever change in Claude Projects?

Hi everyone, I set up a Claude Project a few weeks ago, but the status indicator in the top right corner has never changed from "Indexing" and still shows that little black dot. Does this status ever clear up once processing finishes? Also, my larger PDFs aren't showing visual

↑17 · 6댓

읽기
r/LocalLLaMA12시간 전

Using Local LLMs for Generating Custom Interactive Recursive Textbooks on the Fly

↑42 · 16댓

읽기
r/OpenAI12시간 전

the only person apart from tech bros who is earning using AI

He is the only guy who is not making any videos with titles "How to earn money using AI" !

↑13 · 1댓

읽기
r/singularity13시간 전

New Gemini Omni Blows Competition Away

Saw people giving Google a hard time. Now look at them 😁

↑233 · 43댓

읽기
r/MachineLearning13시간 전

The famous METR AI time horizons graph contains numerous severe errors [D]

Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, [writes](https://www.transformernews.ai/p/against-the-metr-graph-coding-capabilities-software-jobs-task-ai) damningly about the famous METR AI time horizons graph in the Substack publication Transformer: >I

↑46 · 58댓

읽기
r/MachineLearning13시간 전

DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]

Just thought I'd share, I ran a DCGAN on a dual core RISC-V microcontroller, the CH32H417 generating 64x64 cat faces. This is a new RISC-V MCU, so no TFLite, no CMSIS NN and no external memory. It's a pure C inference engine, bit-identical to PyTorch reference outputs. The model

↑10 · 2댓

읽기
r/ClaudeAI13시간 전

How I protect my health when using Claude (and how I didn't before)

Tagged as productivity because without your health, what can you do? All of a sudden, I just felt tired, and I had this banging headache. I thought, okay. It's just a headache. And then I got home, and I knew it was more. Looking back now, it was a combination of many things, bu

↑134 · 53댓

읽기
r/ClaudeAI14시간 전

Stop letting Claude glaze your bad product ideas

Take this from someone who has pitched to investors, works in a C-Suite job, and has constantly been pitched to. Building something from a phrase or an idea can provide a productivity high that can make you feel on top of the world. Claude would help me build whatever I des

↑69 · 47댓

읽기
r/MachineLearning14시간 전

Reconstructing the agent methodology: Decoupling decision-making and execution - open source [P]

I’ve been thinking about a problem in current agent systems: Most agents are becoming very good at execution, but the decision layer before execution is still unclear. Coding agents, research agents, tool loops, sandboxes, workflows, and harnesses are all improving quickly. Onc

↑3 · 0댓

읽기
r/singularity14시간 전

One of the authors of "Attention is All You Need" just argued we should move past it. Pathway’s Post-Transformer debate is worth watching

↑250 · 109댓

읽기
r/singularity15시간 전

Hyundai/Boston Dynamics is going to train Atlas the humanoid robot by watching football videos, and they'll document its progress in an online series called 'School of Football'

↑292 · 41댓

읽기
r/MachineLearning16시간 전

Call for Papers - Workshop on Efficient Reasoning at COLM 2026 [R]

🌟 Announcing the 2nd Workshop on Efficient Reasoning (ER) at @colm2026 — Oct 9! 📣 We welcome submissions! Submit your work here: [https://openreview.net/group?id=colmweb.org/COLM/2026/Workshop/Efficient\_Reasoning](https://openreview.net/group?id=colmweb.org/COLM/2026/Workshop

↑3 · 0댓

읽기
r/LocalLLaMA16시간 전

Is Qwen3.6 current king for local agentic use?

I've been testing other models but it seems like nothing even come close to Qwen3.6 35B A3B for agentic use. The worse I'd get is a loop sometimes, while Gemma4 produced broken tool calls occasionally and I couldn't even get GLM 4.7 Flash REAP past 2 or 3 messages before it start

↑162 · 144댓

읽기
r/ClaudeAI17시간 전

6 months of .md memory, conflicting facts are the hard part

I've been using a .md filesystem for my (mostly coding) agents for over 6 months now and it's been a big improvement, so rn I'm migrating my local fs to the cloud. I've been adding cross linking, truncating, knowledge extraction, etc. The structure ended up having a "warm" layer

↑136 · 48댓

읽기
r/LocalLLaMA17시간 전

MiniCPM5-1B

↑106 · 24댓

읽기
r/LocalLLaMA17시간 전

The Financial Times has published an article about Heretic

https://www.ft.com/content/5630ed79-a263-41ed-9a1a-321617ae310e “The FT was able to use Heretic, a tool available on the popular code repository GitHub, to remove the guardrails from Meta’s Llama 3.3 model in less than 10 minutes without any specialist hardware.” “Heretic creat

↑780 · 200댓

읽기
r/OpenAI17시간 전

Humanity's greatest hits: things we actually paused

↑1285 · 263댓

읽기
r/ClaudeAI18시간 전

I've been using Claude Code as a motion graphics engine for my YouTube videos. It writes the JSX, I render. Edit time roughly halved.

Found a really clean Claude Code use case that's not coding-coding. Remotion (React for video) means motion graphics are JSX components. So I describe what I want in plain English, Claude Code writes the component, I render. Lower thirds, intros, overlays, all reusable across vi

↑107 · 38댓

읽기
r/singularity18시간 전

Demis: Solving erdos problems are far from true invention

↑219 · 107댓

읽기
r/ClaudeAI18시간 전

Are we nearly there?

Implying tech companies besides Anthropic, Google, and Nvidia have any money left over by 2027 after they all ran through cash on hand for tokens. I feel like there are reasonable people, like the guy behind the "ijustvibecodedthis" newsletter who are realistic and help you ACTU

↑1340 · 144댓

읽기
r/LocalLLaMA18시간 전

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable)

Disclaimer: I work for Numind, the company behind this open-weight model TLDR: Image/text to Markdown :-) We just released a 4B model based on Qwen3.5-4B, under Apache-2.0 license. The goal is to make information extraction from complex documents more practical with an open mod

↑217 · 51댓

읽기
arxiv 최신 논문cs.AI · cs.CL · cs.LG
arxiv cs.CL13시간 전

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

We present MobileGym, a browser-hosted, lightweight, fully controllable environment for everyday mobile use, targeting interaction fidelity without replicating proprietary backends. It enables two capabilities previously out of reach for everyday apps: verifiable outcome signals through deterministic state-based judgin

저자: Dingbang Wu, Rui Hao, Haiyang Wang

읽기
arxiv cs.AI13시간 전

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

We present MobileGym, a browser-hosted, lightweight, fully controllable environment for everyday mobile use, targeting interaction fidelity without replicating proprietary backends. It enables two capabilities previously out of reach for everyday apps: verifiable outcome signals through deterministic state-based judgin

저자: Dingbang Wu, Rui Hao, Haiyang Wang

읽기
arxiv cs.AI13시간 전

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiable architectures around foundation models. We refer to this shift as scaling the harness: treating the structured execution layer around a foundation model as a

저자: Shangding Gu

읽기
arxiv cs.LG13시간 전

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiable architectures around foundation models. We refer to this shift as scaling the harness: treating the structured execution layer around a foundation model as a

저자: Shangding Gu

읽기
arxiv cs.AI13시간 전

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation

Subject-driven image generation aims to synthesize new images that preserve the identity of the given subject while following textual instructions. Existing approaches often encode text and reference images separately. This limits cross-modal reasoning abilities and causes copy-paste artifacts. Recent frameworks that c

저자: Shuhong Zheng, Aashish Kumar Misraa, Yu-Teng Li

읽기
arxiv cs.LG13시간 전

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation

Subject-driven image generation aims to synthesize new images that preserve the identity of the given subject while following textual instructions. Existing approaches often encode text and reference images separately. This limits cross-modal reasoning abilities and causes copy-paste artifacts. Recent frameworks that c

저자: Shuhong Zheng, Aashish Kumar Misraa, Yu-Teng Li

읽기
arxiv cs.CL13시간 전

Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning

Multimodal Large Language Models (MLLMs) achieve versatility by reformulating diverse tasks into a unified instruction-following framework via instruction tuning. However, real-world deployment requires continuous adaptation to emerging tasks, motivating Multimodal Continual Instruction Tuning (MCIT). Despite its growi

저자: Jun-Tao Tang, Yu-Cheng Shi, Zhen-Hao Xie

읽기
arxiv cs.LG13시간 전

Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning

Multimodal Large Language Models (MLLMs) achieve versatility by reformulating diverse tasks into a unified instruction-following framework via instruction tuning. However, real-world deployment requires continuous adaptation to emerging tasks, motivating Multimodal Continual Instruction Tuning (MCIT). Despite its growi

저자: Jun-Tao Tang, Yu-Cheng Shi, Zhen-Hao Xie

읽기
arxiv cs.LG13시간 전

Looped Diffusion Language Models

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains underexplored. In this paper, we show that selectively looping the early-middle transformer layers significantly improves both tra

저자: Sanghyun Lee, Chunsan Hong, Seungryong Kim

읽기
arxiv cs.AI13시간 전

Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models

Code review is a critical practice in software engineering, yet the growing scale and frequency of code patches in modern projects, together with the widespread adoption of AI code assistants, make manual review increasingly challenging. Identifying the types of changes within a patch, such as renames, moves, or logic

저자: Bar Weiss, Antonio Abu-Nassar, Adi Sosnovich

읽기
arxiv cs.CL14시간 전

Language Models Need Sleep

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its

저자: Sangyun Lee, Sean McLeish, Tom Goldstein

읽기
arxiv cs.AI14시간 전

Language Models Need Sleep

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its

저자: Sangyun Lee, Sean McLeish, Tom Goldstein

읽기
arxiv cs.LG14시간 전

Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying stored exemplars from prior tasks, which is often impractical. By contrast, language models can sample from their own training distribution, and we show that these

저자: Martin Marek, Dongkyu Cho, Shikai Qiu

읽기
arxiv cs.LG14시간 전

Goal-driven Bayesian Optimal Experimental Design for Robust Decision-Making Under Model Uncertainty

Bayesian optimal experimental design (BOED) selects experiments to maximize information gain about model parameters. However, in decision-critical settings, reducing parameter uncertainty does not necessarily improve downstream decisions, as only specific parameter directions relevant to the objective truly matter. We

저자: Jinwoo Go, Xiaoning Qian, Byung-Jun Yoon

읽기
arxiv cs.AI14시간 전

OrpQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization

The deployment of Large Language Models (LLMs) and Vision Transformers (ViTs) on edge devices is significantly constrained by memory limitations and the critical timing bottlenecks introduced by dense Multiply-Accumulate (MAC) arrays. In the ultra-low bit regime, logarithmic Power-of-Two (PoT) quantization provides a h

저자: Maoyang Xiang, Bo Wang, Tao Luo

읽기
arxiv cs.LG14시간 전

OrpQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization

The deployment of Large Language Models (LLMs) and Vision Transformers (ViTs) on edge devices is significantly constrained by memory limitations and the critical timing bottlenecks introduced by dense Multiply-Accumulate (MAC) arrays. In the ultra-low bit regime, logarithmic Power-of-Two (PoT) quantization provides a h

저자: Maoyang Xiang, Bo Wang, Tao Luo

읽기
arxiv cs.AI14시간 전

Channel-wise Vector Quantization

We present Channel-wise Vector Quantization (CVQ), a novel image tokenization paradigm that replaces patch-wise tokens with channel-wise tokens. Unlike conventional vector quantization, which assigns a discrete token to each patch feature vector, CVQ quantizes each channel of the feature map. This formulation represent

저자: Wei Song, Tianhang Wang, Yitong Chen

읽기
arxiv cs.LG14시간 전

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent to discover the laws of motion of a simulated world whose physics deliberately d

저자: Matt L. Wiemann, Lindsay M. Smith, Peter Melchior

읽기
arxiv cs.AI14시간 전

Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

Large language model agents are increasingly envisioned as always-on personal assistants with access to anything relevant in the user's digital world. Yet current systems operate over only narrow slices of that world, limiting context-sensitive reasoning and effective assistance. Existing benchmarks similarly provide o

저자: Yusong Lin, Xinyuan Liang, Haiyang Wang

읽기
arxiv cs.AI14시간 전

VeriTrace: Evolving Mental Models for Deep Research Agents

Deep research agents face vast, interdependent, and pervasively uncertain information. Existing systems explore what evolving intermediate representations should look like, but leave their evolution to the LLM's implicit reasoning. Without explicit regulation, the intermediate layer is easily contaminated by mixed-qual

저자: Haolang Zhao, Yunbo Long, Lukas Beckenbauer

읽기
arxiv cs.CL14시간 전

Automated Benchmark Auditing for AI Agents and Large Language Models

Modern AI benchmarks operate at a complexity that outpaces traditional verification methods. Tasks authored by domain experts often contain implicit assumptions, incomplete environment specifications, and brittle evaluation logic that human annotation cannot reliably catch. We introduce Auto Benchmark Audit (ABA), an a

저자: Junlin Wang, Federico Bianchi, Shang Zhu

읽기
arxiv cs.LG14시간 전

Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning

Wasserstein policy gradient (WPG) is a policy optimization method for reinforcement learning (RL) that exploits the optimal-transport geometry of action distributions. For the entropy-regularized RL objective, WPG evolves each state-conditional policy by transporting it along the action gradient of the soft Q-function

저자: Zhaoyu Zhu, Rui Gao, Shuang Li

읽기
arxiv cs.CL14시간 전

StakeBench: Evaluating Language Understanding Grounded in Market Commitment

Existing financial NLP benchmarks often rely on labels supplied by outside observers, measuring how language is perceived rather than what speakers have committed to in the market. We introduce StakeBench, an evaluation framework for language understanding grounded in market commitment. StakeBench links 560,876 comment

저자: Yunhua Pei, Jingyu Hu, Yiwei Shi

읽기
arxiv cs.AI14시간 전

StakeBench: Evaluating Language Understanding Grounded in Market Commitment

Existing financial NLP benchmarks often rely on labels supplied by outside observers, measuring how language is perceived rather than what speakers have committed to in the market. We introduce StakeBench, an evaluation framework for language understanding grounded in market commitment. StakeBench links 560,876 comment

저자: Yunhua Pei, Jingyu Hu, Yiwei Shi

읽기
arxiv cs.LG14시간 전

Active Query Synthesis for Preference Learning

Efficient learning of user preferences is crucial for many modern decision making systems but typically requires costly labeled data. Active learning reduces this cost, yet standard methods are computationally expensive due to pool-based evaluation. Further, most methods assume all query feedback is equally reliable, i

저자: Namrata Nadagouda, Nauman Ahad, Maegan Tucker

읽기
arxiv cs.CL14시간 전

WhoSaidIt: Human-LLM Collaborative Annotation for Text-Based Multilingual Speaker-Attribute Classification

Annotating speaker attributes from text is inherently ambiguous, particularly in multilingual settings where demographic and social cues are implicit and culturally variable. We propose a human-large language model (LLM) collaborative re-annotation framework for stabilizing multilingual speaker-attribute labels under p

저자: Lingyu Gao, Will Monroe, David Smith

읽기
arxiv cs.CL14시간 전

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vectors. Thus, the conflict-resolution toolki

저자: Parth Darshan, Abhishek Divekar

읽기
arxiv cs.CL14시간 전

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

Activation oracles aim to make the activations of other models legible to humans and yield promising results compared to white-box interpretability techniques. However, uncertainty quantification (UQ) for the natural-language outputs of such activation oracles is so far understudied. Here, we investigate 6 different me

저자: Federico Torrielli, Peter Schneider-Kamp, Lukas Galke Poech

읽기
arxiv cs.CL14시간 전

Peak-Then-Collapse and the Four Interface Channels of Knowledge-Graph Tool Use

We test the standard RLVR tool-use recipe -- GRPO on Qwen2.5-7B-Instruct -- on a deliberately minimal knowledge-graph tool API: four Freebase navigation verbs over Complex WebQuestions. Under a self-verifiable retrieval reward, the policy's tool-grounded answer rate climbs from $3.8\%$ to $9.6\%$ over 250 steps, then c

저자: Tianda Sun, Dimitar Kazakov

읽기
arxiv cs.CL14시간 전

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab evaluates both whether an agent can solve a problem using causal evidence and whether its answer is supported by a correct hypothesis about the underlying causal mechanism. Each ep

저자: Junlin Yang, Dylan Zhang, Xiangchen Song

읽기
dev.to AI 글개발자 블로그
dev.to16시간 전

Why does AI forget what you said (and how to fix it)

I received following comment on my hallucinations blog post. Comment on Why...

by Rohini Gaonkar

읽기
dev.to20시간 전

Don’t let AI break your collective thinking: a practical guide for engineering teams

Over the past few years, my workflow as an engineer has changed a lot. I went from the occasional...

by Julien Avezou

읽기
dev.to1일 전

MIA: A Futuristic AI Desktop Assistant Built with Voice, Gestures, and Controlled Chaos

Most desktop assistants today feel like they were designed by someone whose greatest ambition was...

by TROJAN

읽기
dev.to19시간 전

Every Tool Eventually Becomes Tuesday

I opened my email this morning and there were three product announcements from companies I have never...

by Evan Lausier

읽기
dev.to2일 전

I Ditched Cloud LLMs for Gemma 4 4B: A DevOps Engineer's 48-Hour Reality Check

This is a submission for the Gemma 4 Challenge: Write About Gemma 4 I Ditched Cloud LLMs...

by Asmae

읽기
dev.to2일 전

Build It, Then Use It: How I wrote 435 AI engineering lessons from scratch

The first time I wrote a tokenizer, I did it with a for loop. I counted byte pairs by hand, merged...

by Rohit Ghumare

읽기
dev.to2일 전

My Impression of AI in Programming

My impression of AI use in programming as a principle engineer.

by Paul J. Lucas

읽기
dev.to1일 전

Now I See Why Translators Are Panicking Over AI—Should Coders Panic Too?

Last year, I met a young translator reinventing herself. She studied Translation for five years at a...

by Cesar Aguirre

읽기
dev.to19시간 전

Why AI-Generated Code Is Always Good Enough — And Never Great

AI wrote a function for me last week It worked Tests passed Edge cases handled I shipped it. But...

by Harsh

읽기
dev.to16시간 전

If Microsoft and Uber can't afford AI coding, what chance do the rest of us have?

Two stories landed in the same news cycle. Microsoft cancelled most internal Claude Code licenses....

by Jonathan Murray

읽기
dev.to1일 전

Google I/O 2026: AI Built an OS in 12 Hours. I Spent Mine Sorting Screenshots. 🤦

This is a submission for the Google I/O Writing Challenge I haven't watched a tech keynote in a...

by Aabhas Sao

읽기
dev.to2일 전

From Govhack Win to Something That Actually Matters

This is a submission for the GitHub Finish-Up-A-Thon Challenge What I Built Project...

by ujja

읽기
dev.to2일 전

Everyone's Talking About Gemini 3.5 Flash. The Real Story at Google I/O 2026 Was a Skill File.

This is a submission for the Google I/O Writing Challenge Everyone walked away from Google I/O...

by Sreejit Pradhan

읽기
dev.to3시간 전

Behind The Badge: How We Built 2,000 Hackable Badges For Temporal Replay

From a design prompt to a factory floor in Shenzhen: the story behind Temporal's Replay conference badge.

by Shy Ruparel

읽기
dev.to5시간 전

Cursor 3 ships parallel AI agents. Here is the multi-agent workflow that actually works.

Cursor 3 ships parallel AI agents. Here is the multi-agent workflow that actually...

by GDS K S

읽기
dev.to17시간 전

Less Toil, More Flow - Automating the Path from Request to Implementation

How I connected Slack, Linear, and Kiro to automate ticket creation, AI-assisted triage, and implementation planning for a platform team supporting hundreds of engineers.

by Davide de Paolis

읽기
dev.to14시간 전

Building Cursor for Community: A Buildathon Built on Time Pressure

Over the weekend, I attended an event hosted by Cursor Kenya, bringing together developers, builders,...

by Valery Odinga

읽기
dev.to2시간 전

FairLens AI: An Intelligent Dashboard for Automated Bias Auditing

This is a submission for the GitHub Finish-Up-A-Thon Challenge What I Built FairLens AI...

by Bibhu Pradhan

읽기
dev.to22시간 전

Thoughts on Codingame 2026 Spring challenge

Trolls in woods be choppin'

by Augusts Bautra

읽기
dev.to2일 전

Google I/O 2026: The Year Google Stopped Building AI Assistants and Started Shipping AI Engineers

This is a submission for the Google I/O Writing Challenge The moment that changed how I thought...

by Prakhar Shukla

읽기
YouTube AI 채널VIDEO FEED
Two Minute Papers

DeepMind’s Insane AI Breakthroughs With CEO Demis Hassabis

재생
Siraj Raval

I Let AI Cold-Call 100 Plumbers (Genspark)

재생
Two Minute Papers

DeepSeek’s New AI Is A Game Changer

재생
Machine Learning Street Talk

Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

재생
Two Minute Papers

NVIDIA New AI Is An Efficiency Monster

재생
Two Minute Papers

OpenAI's ChatGPT 5.5 Instant: The Good, The Bad And The Insane

재생
Anthropic

Translating Claude’s thoughts into language

재생
Lex Fridman

FFmpeg: The Incredible Technology Behind Video on the Internet | Lex Fridman Podcast #496

재생
Machine Learning Street Talk

The AI Progress Chart Everyone Is Misreading — Beth Barnes & David Rein

재생
Lex Fridman

Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age | Lex Fridman Podcast #495

재생
Anthropic

An initiative to secure the world's software | Project Glasswing

재생
Anthropic

When AIs act emotional

재생
Siraj Raval

This AI made me $2,345 in 24 hours

재생
Siraj Raval

i gave chatgpt $2,000 to trade stocks for 24 hours

재생
Lex Fridman

Jensen Huang: NVIDIA - The $4 Trillion Company & the AI Revolution | Lex Fridman Podcast #494

재생
Siraj Raval

I Let an AI Run My Life for 50 Days

재생
Machine Learning Street Talk

When AI Discovers the Next Transformer — Robert Lange

재생
Lex Fridman

Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming | Lex Fridman Podcast #493

재생
Machine Learning Street Talk

The Dangerous Illusion of AI Coding? - Jeremy Howard

재생
Anthropic

Introducing Claude Opus 4.6

재생
James Briggs

Predictive Query Language (PQL) Explained

재생
James Briggs

Data Science as a Service | Kumo AI Full Walkthrough

재생
James Briggs

Agents are coming for Ecom

재생
James Briggs

Build Agentic Ecommerce with KumoRFM

재생