Comprehensive Research Contribution: Web3 Privacy Projects
Date: October 25, 2025
PR: #1997
TL;DR
Just submitted a large research contribution to the Web3Privacy Explorer. 40 privacy projects now have comprehensive documentation including:
- GitHub code analysis
- Team details
- Security assessments
- Technical deep-dives
Total contribution: 265 files, 12,378 lines of analysis
What Was Submitted
Research Scope: 171 Analyzed β 40 Submitted, 788 More in Pipeline
This submission represents the first wave of a comprehensive ecosystem analysis:
Initial Analysis Pool:
- 171 projects analyzed or research attempted in initial deep-dive
- 40 projects passed quality threshold for submission (had sufficient verified data)
- 131 projects had incomplete or insufficient data for submission
Research Pipeline:
- 788 projects in research-required/ folder awaiting analysis
- 705 from Web3Privacy Explorer database
- 48 verified from original archive
- 35 privacy ecosystem projects (funders, infrastructure, tools)
Total ecosystem coverage: 959 Web3 privacy projects (171 analyzed + 788 pipeline)
This PR includes the 40 projects where we found enough information to meet our quality standards through multi-source verification. The methodology that worked for these 40 will now be applied to the remaining 919 projects.
See examples of research depth:
Mysterium Network OSINT data (2,212 lines - infrastructure & team OPSEC)
Tornado Cash smart contracts (on-chain data)
What Each Submitted Project Includes
CODE_REVIEW.md - Repository analysis (stars, forks, contributors, commit activity, languages)
TEAM.md - Verified leadership information and organization structure
SECURITY.md - Security features, audit status, privacy mechanisms
TECHNICAL.md - Technology stack, architecture, capabilities
File Structure Example
web3privacy/explorer-data/src/projects/cake-wallet/
βββ index.yaml β Basic metadata + links to reports
βββ logo.png β Project logo
βββ README.md β Project overview
βββ project_metadata.json β Aggregated data
βββ reports/
βββ CODE_REVIEW.md β 160 lines of GitHub analysis
βββ TEAM.md β 98 lines of team info
βββ SECURITY.md β 117 lines of security analysis
βββ TECHNICAL.md β 172 lines of tech details
Research Methodology
This research follows a systematic, multi-phase approach based on best available information:
Information Flow Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 1: BROAD SEARCH β
β Automated discovery across official sources (parallel agents) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββ
β Website β GitHub β Social β News/Blog β
β Scraping β API β Media β Aggregators β
ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β verified_data β
β (confidence β
β scores 0-1) β
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 2: DEEP DIVES β
β Specialized analysis with domain-specific tools β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββ
β GitHub β OSINT β On-Chain β Smart β
β Analysis β Tools β Analysis β Contract β
β (commits, β (Spiderfoot, β (APIs, β Review β
β languages) β OPSEC) β explorers) β (Solidity) β
ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β analysis/ β
β (JSON files β
β with data) β
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 3: REPORT GENERATION β
β Clean markdown reports (internal methodology removed) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββ
β CODE_ β TEAM.md β SECURITY.md β TECHNICAL.md β
β REVIEW.md β β β β
ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β Web3Privacy Explorer β
β (Public Database) β
βββββββββββββββββββββββββ
Four-Layer Data Architecture
Our research maintains a strict separation between internal methodology and public presentation:
Layer 1: Internal Research (sources/)
- verified_data.json - Raw data with confidence scores (0.0-1.0)
- Multi-source verification metadata
- Research methodology tracking
- Never publicly exposed
Layer 2: Analysis Data (analysis/)
- github_analysis.json - Repository metrics, languages, activity
- smart_contracts.json - Contract addresses, deployment info
- osint_data.json - Infrastructure and team OSINT research
- oso_data.json - Open Source Observer data
- Intermediate processing layer
Layer 3: Public Reports (reports/)
- CODE_REVIEW.md - Clean GitHub analysis
- TEAM.md - Verified team information
- SECURITY.md - Security features and audits
- TECHNICAL.md - Technology and capabilities
- Clean, professional markdown - no internal methodology exposed
Layer 4: Export Metadata (root)
- project_metadata.json - Aggregated data for APIs
- index.yaml - Web3Privacy Explorer format
- README.md - Project landing page
Research Quality Standards
Every data point follows our βconstitutional researchβ methodology:
No Placeholders or Fabrication
Best available information with honest limitations
- No placeholder text like βTeam of 5-10 developersβ
- No estimates like βApproximately $2M in fundingβ
- If we donβt know it, we document it as a gap
- Note: Sources themselves may contain errors, tools can hallucinate, experimental pipelines may introduce issues - we ask the community to submit corrections via PRs/issues
Multi-Source Verification
Critical facts require 2+ independent sources
Founder: Vikrant Sharma
βββ Source 1: Official interview (changenow.io)
βββ Source 2: LinkedIn profile
βββ Confidence: 0.95
Confidence Scoring (0.0 - 1.0)
- 1.0 - Official source (website, GitHub)
- 0.9-0.95 - Secondary source (verified interview, LinkedIn)
- 0.7-0.85 - Tertiary source (news, community)
- < 0.7 - Not included in public reports
Honest Gap Reporting
We explicitly document what we DONβT know.
This prevents others from wasting time on failed research approaches.
Decentralized Agent Swarms (Tool Orchestration)
To scale this methodology across 40 projects, we deployed parallel LLM agents for task coordination:
Batch Processing Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONTROL AGENT (Task Orchestrator) β
β Assigns projects to worker agents, monitors progress β
β LLM manages which tools to run, not data generation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ¬ββββββββββββββ
βΌ βΌ βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β Agent 1 β β Agent 2 β β Agent 3 β...β Agent 6 β
β Batch 1 β β Batch 2 β β Batch 3 β β Batch 6 β
β 7 proj β β 7 proj β β 7 proj β β 7 proj β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β β β β
βΌ βΌ βΌ βΌ
For each project (LLM decides which tools to run):
1. WebSearch official sources β real-time data
2. GitHub API calls β live repo data
3. Python scripts β process/analyze results
4. Generate markdown reports β format output
5. Verify quality β cross-check sources
What LLMs Did vs Didnβt Do
LLMs handle (coordination & transformation layers):
- Deciding which tools to run (WebSearch, GitHub API, Python scripts)
- Coordinating task sequences (search first, then analyze)
- Extracting structured data from API responses
hallucination risk - Formatting results into markdown reports
hallucination risk - Cross-validating between different data sources
LLMs DIDNβT do:
- Generate data from training knowledge
- Make up facts or statistics
- Fill in missing information with guesses
Where hallucination risk exists:
- Coordination layer: LLM might choose wrong tool or misinterpret task requirements
- Transformation layer: LLM might misextract data when parsing API responses or formatting markdown
Risk mitigation through validation layers:
- Multi-agent cross-validation (different agents verify same data)
- Confidence scoring on extracted data
- Manual spot-checks of outputs
- Source URLs preserved in internal files for verification
All actual research data comes from: Live web searches, API calls, and real-time sources - not LLM training data.
Parallel Execution Benefits
Quality assurance: Each agent runs independent verification using real-time tools, control agent validates outputs before acceptance. Multiple agents working in parallel allows faster processing while maintaining research quality through cross-validation.
Quality Metrics
By The Numbers
| Metric | Value |
|---|---|
| Projects committed | 40 |
| Total research files | 265 |
| Lines of analysis | 12,378 |
| Average confidence | 0.85-0.95 |
| No intentional fabrication | Yes |
| Multi-source verification | 100% |
| Community corrections welcome | Yes |
Data Quality Breakdown
Tier 1 (Basic Info): 100% complete
- Website, GitHub, description, status
Tier 2 (Detailed Data): 70-80% complete
- Team information, technology stack, security features
Tier 3 (Advanced): 40-60% complete
- Full funding details, complete team rosters, detailed on-chain metrics
ATTEMPTED Files: Honest Gap Reporting
18 projects have blockchain_metrics_ATTEMPTED.md files documenting research that found no data:
- Attempted: Etherscan, DeFiLlama, block explorers
- Result: No verifiable on-chain data found
- Reason: Could be testnet-only, private chains, or insufficient documentation
- Purpose: Prevent others from repeating failed approaches
Research Tools & Technology Stack
Data Collection Tools
- WebFetch - Automated website content extraction (real-time data)
- WebSearch - Multi-source verification and news discovery (real-time data)
- GitHub API - Repository metrics and code analysis (live API calls)
- Blockchain APIs - Etherscan, DeFiLlama, custom block explorers (on-chain data)
- Playwright MCP - Browser automation for debugging and verification
Analysis Tools
- Python scripts - Custom data processing and analysis
- Spiderfoot - OSINT reconnaissance on team members and infrastructure
- Smart Contract Review - Solidity analysis, security pattern detection
Agent Coordination
- LLM agents - Used for tool orchestration and task management (NOT for data generation)
- Important: LLMs coordinated which tools to run and how to process results
- Data sources: All actual research data came from web APIs, searches, and live sources - not LLM training data
Quality Assurance
- βConstitutional researchβ methodology - No intentional fabrication, best available sources
- Multi-agent verification - Cross-checking between agents
- Confidence scoring - Automated quality metrics
- Manual spot-checks - Random sampling of 5+ projects per batch
- Community review - Open to corrections via PRs and issues
Whatβs Next
Immediate Next Steps
-
Tag all 40 project repositories on GitHub
- Notify project teams that their research is available
- Encourage them to review and submit corrections/updates
- Invite them to fill data gaps (team info, funding, on-chain metrics)
-
Research-required projects (47 remaining)
- Work to obtain basic surface-level information (website, repo, description)
- Many are promising projects but lack public documentation
- Once basics are found, apply same comprehensive research methodology
-
Regular updates as projects evolve
- New releases, team changes, security audits
- On-chain metric updates for active protocols
- Community-submitted corrections and additions
Projects Awaiting Further Research (47 total)
Click to expand list of projects in research-required/ folder
These projects need basic surface-level information before comprehensive research can begin:
1inch-privacy, aleo, anoma, brave-browser, curve-privacy, curvy, dark-forest, dash, dusk-network, eth2-deposit-cli, farcaster, gitcoin-grants, hinkal, horizen, hurricane-core, inco, keep-network, lens-protocol, maci, mask, metamask-snaps, mina-protocol, night, nighthawk-wallet, nocturne, nuconstruct, nucypher, polygon-hermez, polygon-zero, pse-privacy-scaling-explorations, railway, rarime, ronin, samourai-wallet, snapshot-x, starknet, taceo, token-shielder, zama, zecrey, zkbob, zksync-era, zupass
If youβre involved with any of these projects - please submit basic info (website, GitHub, brief description) so we can conduct comprehensive research!
Long-term Vision: Decentralized Research Infrastructure
The goal is to scale this methodology across the entire Web3 privacy ecosystem through:
- Scale to 700+ projects - Apply this research methodology broadly
- Continuous monitoring - Detect new commits, releases, security advisories
- Automated updates - Generate PRs when significant changes detected
- Community-driven - Multiple contributors using various tools and approaches
- Open methodology - Anyone can apply this research framework with their own infrastructure
Why Code Review & OPSEC Matter for Privacy Projects
Privacy projects require deeper analysis beyond basic features. Hereβs why:
Code Review is Critical for Privacy Claims
- Trust verification: βPrivacy-preservingβ is a claim that requires code-level validation
- Implementation quality: Privacy features in whitepaper β privacy features in production code
- Active development: Regular commits indicate ongoing security maintenance
- Community involvement: Contributor count and diversity signal decentralization
- Example: A mixer claiming βtrustless privacyβ but with centralized admin keys visible in code
OPSEC Research Protects Users
- Infrastructure analysis: Centralized servers for βdecentralizedβ privacy tools
- Funding transparency: Whoβs backing privacy infrastructure matters for trust
- Attack surface: OSINT reveals potential vectors (DNS, hosting, dependencies, team member vulnerabilities)
These layers are especially important for privacy projects because:
- Users trust these tools with sensitive data
- Privacy failures can have severe real-world consequences
- Marketing claims often exceed technical reality
- Decentralization claims need verification
- Security research requires technical depth
Why This Matters
For Projects:
- Professional, accurate documentation of their work
- Technical validation of privacy claims
- Regular updates as they evolve
- Opportunity to correct errors and fill gaps
For Users:
- Comprehensive, up-to-date information
- Technical verification of privacy features
- Honest assessment of limitations and risks
- Verified, multi-source data
For Researchers:
- Avoid duplicating failed research
- Build on existing verified work
- Contribute to growing knowledge base
- Technical depth beyond surface-level analysis
Contributing & Feedback
How to Help
Project Teams: Found an error? Have updated information?
- Submit PR to: GitHub - web3privacy/explorer-data: Privacy Explorer Data Repository
- Or open issue: Tag @M0nkeyFl0wer
Researchers: Want to contribute?
- Use our methodology: GitHub - M0nkeyFl0wer/ethereum-cypherpunk-research: A ongoing open source research project focused Ethereum Privacy Tools and the related ecosystem.
- Focus on projects in research-required/ folder
- Apply the same quality standards (multi-source verification, no fabrication, honest gaps)
Community: Questions or suggestions?
- Reply to this thread
- DM me on the forum
- Open a discussion on GitHub
What Should We Prioritize Next?
Iβd love the communityβs input on next steps:
Potential priorities:
- Tag all 40 projects on GitHub to notify teams?
- Focus on the 47 research-required projects to get basic info?
- Deeper OPSEC research on high-profile projects (infrastructure, dependencies)?
- On-chain analysis for protocols with smart contracts?
- Automated monitoring for security advisories and major updates?
- Something else entirely?
Specific questions:
- Which projects from the research-required list are most important to the community?
- What additional data layers would be most valuable? (Funding details? Token metrics? Audit history?)
- Should we prioritize breadth (more projects) or depth (more detailed analysis)?
- Any tools or data sources we should integrate?
Drop your thoughts below! This is community-driven research - your input shapes the roadmap.
Repository Links
Main Research Repo:
PR to Web3Privacy Explorer:
Forked Explorer (with all files):
Acknowledgments
Huge thanks to:
- Web3Privacy team for maintaining this essential database
- All project teams for building privacy-preserving infrastructure
- Web3Privacy community for feedback and support
- Future contributors who will help scale this research methodology
License
All research data submitted under Open Database License (ODbL) to match Web3Privacy Explorer licensing.
Important Disclaimers
This research is based on publicly available information and best-effort analysis. Limitations include:
- Source reliability: We rely on official sources, but those sources themselves may contain errors or outdated information
- Tool limitations: Research tools (APIs, OSINT tools, browser automation) can miss data, timeout, or introduce errors
- LLM coordination & transformation risks: While research data comes from live sources (not LLM training data), LLMs can introduce errors in:
- Coordination layer: Choosing wrong tools or misinterpreting tasks
- Transformation layer: Misextracting data from API responses or incorrectly formatting outputs
- Validation layers (multi-agent verification, manual spot-checks) mitigate but donβt eliminate this risk
- Experimental pipeline: This methodology uses experimental agent swarms and novel approaches - bugs may exist
- Point-in-time data: Information accurate as of October 2025 - projects evolve rapidly
- Coverage gaps: Some areas (funding, complete team rosters, on-chain metrics) have limited public data
We actively encourage corrections! If you spot errors, outdated info, or have additional data:
- Submit a PR to the Web3Privacy Explorer
- Open an issue on our research repo
- Comment on this thread
This is a living dataset that improves through community collaboration.
Questions? Comments? Letβs discuss below! ![]()
Iβm happy to explain the methodology in more detail, share specific examples, or discuss how we can scale this to cover more projects.
Privacy is a fundamental right.
Research conducted October 2025
Contributor: @M0nkeyFl0wer