Current Location:Home > News > Company News
Nvidia has open-sourced and released its latest AI model!
Release Time:2025-12-19 16:30:24

NVIDIA has officially open-sourced and released its new generation of AI models: NVIDIA Nemotron 3. The Nemotron 3 series consists of three models: Nano, Super, and Ultra. The official introduction states that it possesses powerful intelligent agent, reasoning and dialogue capabilities.



Nemotron3 offers three versions of different scales:


Nano: The smallest model, with an activation parameter scale of 3.2B (3.6B when including embeddings) and a total parameter scale of 31.6B, is used for tasks with clear goals and extremely high efficiency requirements. It outperforms similar models in terms of accuracy while maintaining extremely high cost-effectiveness in inference.


Super: It is approximately four times larger than Nano, with a parameter scale of about 100B. It is designed for multi-agent applications and features high-precision inference capabilities.


Ultra: Approximately 16 times larger than Nano, with a parameter scale of about 500B, equipped with a more powerful inference engine, suitable for more complex application scenarios.


Nemotron 3 supports a context window of 1M token, enabling the model to perform continuous reasoning on large code bases, long documents, extended conversations, and aggregated retrieval content. Unlike the fragmented block heuristic method, the agent can retain the complete evidence set, historical buffer, and multi-stage planning all in a single context window.


The official pointed out that this ultra-long context capability benefits from the hybrid Mamba-Transformer architecture of Nemotron 3, which can efficiently handle extremely long sequences. Meanwhile, the MoE routing mechanism reduces the computational cost of a single token, making it feasible in practice to handle such large-scale sequences in the inference stage.


In scenarios such as enterprise-level retrieval enhancement generation (RAG), compliance analysis, hours-long intelligent speech, or monolithic code repository understanding, a 1M token context window can significantly enhance fact alignment capabilities and reduce context fragmentation issues.


The official statement indicates that the Nano version has been officially released, and the Super and Ultra versions are expected to be launched in the first half of 2026.


The most significant technical highlight of the Nemotron 3 series this time lies in the introduction of the open hybrid MAMBA-Transformer MoE architecture, which is targeted at high-speed and long-context inference scenarios in multi-agent systems.


Nvidia has adopted the hybrid Mamba-Transformer MoE architecture in several of its models, including Nemotron-Nano-9B-v2.


Nemotron 3 integrates three architectures into a single backbone network:


Mamba layer: Used for efficient sequence modeling


Transformer layer: Used for high-precision inference


MoE routing mechanism: Achieving scalable computing efficiency


Mamba can effectively track long-term dependencies with extremely low memory overhead and maintain stable performance even when handling hundreds of thousands of tokens. The Transformer layer is supplemented by a sophisticated attention mechanism, capturing the structural and logical relationships required for tasks such as code operations, mathematical reasoning, or complex planning.


The official pointed out that compared with Nemotron 2 Nano, this design "can achieve a token throughput increase of up to 4 times", and significantly reduces the inference cost by reducing the generation of inference tokens by up to 60%.


While achieving more advanced accuracy and inference performance, Nemotron 3 Super and Ultra also introduce a breakthrough innovation: latent MoE (Latent Space Expert Hybrid).


Each expert performs calculations in a shared latent representation space and then projects the results back into the token space. This design enables the model to invoke up to four times the number of experts at the same reasoning cost, thereby achieving stronger specialization capabilities in aspects such as fine-grained semantic structures, domain abstraction, and multi-hop reasoning patterns.


In order to better align Nemotron 3 with the real agent behavior, the model is trained through multi-environment reinforcement learning in NeMo Gym in the post-training stage.


NeMo Gym is an open-source library for building and scaling reinforcement learning environments. These environmental assessment models' ability to execute action sequences is no longer limited to single-round responses, such as generating correct tool calls, writing runnable code, or producing multi-step plans that meet verifiable standards.


This track-based reinforcement learning training approach enables the model to perform more stably and reliably in multi-step workflows, reduces inference drift, and better handles common structured operations in the agent pipeline.


At this point, some friends might have doubts: Isn't NVIDIA a hardware company that makes Gpus? Why make your own AI model? In fact, in addition to providing chips and Gpus, NVIDIA also offers a large number of its own models, covering multiple fields such as physical simulation and autonomous driving. In 2024, NVIDIA released the first batch of models under the Nemotron brand, based on Meta's Llama 3.1 design. Since then, NVIDIA has launched multiple Nemotron models of different sizes and tuned for specific scenarios, all of which have been released as open source for use by other companies. Some enterprises, including Palantir Technologies, have integrated NVIDIA's models into their own products.


Just last week, NVIDIA also announced a new open inference visual language model, Alpamayo-R1, which focuses on autonomous driving research. Nvidia stated that it has added more workflows and guidelines covering its Cosmos world models, which are open source and under permissive licenses, to help developers better leverage these models to develop physical AI. From all these actions, it can be seen that NVIDIA is deliberately promoting the construction of an open-source ecosystem. The official statement also confirms this. Kari Briski, vice president of enterprise generative AI, said that NVIDIA's goal is to provide a "model that people can trust".


Disclaimer: The article is sourced from the Internet. In case of any dispute, please contact customer service.


Recommend News

Copyright © UDU Semiconductor