San Francisco, California, December 19th, 2024,
Chainwire
Constellation Network, a Web3 ecosystem validated by the US
Department of Defense, today announced the launch of a customized
blockchain developed in partnership with the Common Crawl
Foundation, to create the industry's first cryptographically
secure, immutable archive of internet data for AI training and
development.
The collaboration introduces a new approach to validating and
securely accessing 17 years of internet crawl data—spanning
nearly 9 petabytes which 80% of Large Language Models (LLMs)
use to train AI—through an immutable, cryptographically secured
blockchain network built on Constellation. This innovative
application-specific network, or Metagraph, addresses pressing
concerns in AI development while exploring vast new use cases for
blockchain technology in emerging industries: data provenance,
privacy, and ethical sourcing. Furthermore, the network will
utilize Constellation's DAG utility asset to secure the archived
internet crawls. This represents a significant advancement in
utilizing cryptocurrency as a mechanism for businesses to notarize
data, shifting the focus from consumer costs or gas fees typical of
many other layer-one networks to an operational expense.
Key Technological Innovations
- Comprehensive Data Archiving: A fully
immutable copy of internet history, providing unprecedented
transparency and traceability for AI training datasets
- End-to-End Encryption: Cryptographic security
that ensures data integrity throughout the AI development
lifecycle
- Ethical AI Framework: A robust solution for
addressing concerns around data collection, storage, and usage in
large language models
"This integration is a critical step forward in securing the
future of AI development," said Alex Brandes, CTO of Constellation
Network. "By ensuring cryptographic integrity and immutability of
training data, we are addressing one of the most pressing
challenges in the field today: trustworthiness and provenance of
datasets. We believe our platform will grow to become a cornerstone
in the field of responsible AI development, setting new standards
for data integrity and trust.”
Industry Applications
The blockchain-enabled data archive is already attracting
attention from advanced AI research initiatives. TraceAI, a project
developed through the National Science Foundation (NSF) and SBIR
program, is in testing stages in the development of their own
application-specific network, built on Constellation, to add
immutability, auditability, and proof of authorship to its training
models and to develop advanced watermarking technologies. TraceAI
will also leverage Common Crawl’s Constellation-built
solution to further extend their work in blockchain encrypted AI to
include tracking the source origin of data.
Kevin Jackson, Vice President of Space Domain Communications
& Commercialization for Forward EdgeAI, emphasizes the
significance of this breakthrough: "This represents the natural
evolution of AI and machine learning model development—transforming
data management from a technical challenge to a trusted business
tool that drives global standardization and verification."
Looking Forward
Over the coming months, Constellation Network and Common Crawl
Foundation will work together to expand on solution sets for AI
developers and further integrate the distribution of the
cryptographically validated access to the crawl as part of the
standard release process.
“For users of the Crawl who are concerned about the provenance
of the data, especially those using it for AI models, Constellation
and their hypergraph blockchain provides an elegant solution”, said
Rich Skrenta, Executive Director of the Common Crawl, “we are
looking forward to adding the ability to securely validate the
crawl as part of our standard distribution by partnering with
Constellation”.
Evidence of this integration can be found on Constellation’s
transaction viewer, called the “DAG explorer,” and developers can
get started using verified historical crawls for AI applications.
Please follow along for further solutions to be developed by
Constellation, Forward Edge-AI, and Common Crawl.
About Constellation Network Constellation is a
leading blockchain network advancing innovation through on-chain
data security, partnering with critical global stakeholders,
including the U.S. Department of Defense, to deliver
transformative, next-generation technologies.
About Common Crawl Foundation The Common Crawl
Foundation is a 501(c)(3) non-profit organization dedicated to
providing a copy of the internet to the public, free of charge.
Their web archive consists of petabytes of data collected over
years of web crawling, serving as a critical resource for
researchers, businesses, and developers worldwide.
About Forward Edge-AI Forward Edge-AI is at the
forefront of a revolution in responsible and inclusive Artificial
Intelligence (AI) for the betterment of humanity. Since its
foundation in 2019, our goal is to become the dominant player in
Artificial Intelligence and lead the revolution in augmenting edge
technology with human intelligence.
About Common Crawl Foundation
Contact
Email: press@constellationnetwork.io
Website: https://constellationnetwork.io/
Twitter: https://x.com/conste11ation
GitHub: https://github.com/Constellation-Labs/tessellation
DAG Explorer: https://mainnet.dagexplorer.io/
Contact
Dagnum
PI
dagnum@stardust-collective.org