Published on February 28, 2025

Deepseek Private Cloud Deployment

Editor-in-Chief

@Vdufu

DeepSeek Large Language Model Private Deployment Plan – Enterprise Edition

More recently, our focus has been on digital information technology, specializing in innovations in generative AI and large language models (LLMs). We are committed to driving digital and intelligent transformation for SMBs, corporate and research institutions. Our core expertise lies in training, inference, private deployment, and application development of domain-specific LLMs. By deeply integrating industry needs, we've developed a series of cutting-edge intelligent applications, such as model-driven intelligent ticket processing systems and emergency command mobilization platforms. In October 2024, we demonstrated a privatized deployment-ready large-model inference machine and software platform. These achievements have been successfully applied in emergency command systems and local governmental digital service systems, demonstrating exceptional performance.

Goals of the Deployment

The aim of deploying DeepSeek's private LLM service is to establish a secure, controllable, and efficient intelligent work platform. This platform can be implemented within an organization's internal network, ensuring data security and regulatory compliance. It is also scalable, supporting future AI capability upgrades and optimizations. Users can fully leverage DeepSeek's comprehensive features, significantly enhancing work efficiency. Additionally, the solution integrates organizational data to create a dedicated knowledge base, enabling domain-specific AI applications that offer intelligent decision support. Post-deployment, organizations will directly transition into the intelligent era of digital transformation, laying a strong foundation for future competitiveness in intelligent systems.

DeepSeek Large Language Models

DeepSeek-R1-Distill-Qwen-32B: 32 billion parameters
DeepSeek-R1-Distill-Llama-70B: 70 billion parameters
DeepSeek-R1-671B: 671 billion parameters

Model Name	Output Quality	Resource Consumption	Applicable Scenarios
DS-R1-32B	Medium	Medium	Text summaries, keyword extraction, style analysis
DS-R1-70B	High	High	Complex text generation, instruction adherence, logical reasoning
DS-R1-671B	Very High	Very High	General-purpose applications

Interaction Interface

The system interacts through natural language dialogue, supporting multilingual text understanding and generation.

Advantages of Private Deployment

Deploying locally to an organization's internal network includes both hardware (compute servers) and software (inference engines, interaction interface software). Users access services via a web interface. Key advantages include:

Higher Privacy: Enhanced control over data and systems ensures sensitive data remains internal, reducing risks of data leakage or unauthorized access.
Faster Response Times: Locally hosted AI reduces latency, offering a quicker response and better user experience.
Customizability: Organizations can tailor and optimize the AI model for specific scenarios and performance needs.
Lower Long-Term Costs: While initial setup costs may be high, local deployment reduces dependency on external cloud services, saving ongoing fees.

Core Application Features

We summarize core application features as follows:

Intelligent Q&A: Answers diverse questions, such as those about science, history, and military topics, with dynamic follow-up capabilities.
Rapid Decision-Making: Processes large volumes of complex data efficiently for informed commercial or operational decisions.
Content Generation: Creates documents like reports, speeches, and emails with clear logic and rich content.
Knowledge Base: Supports document uploads in Word, PDF, or TXT formats, building a domain-specific knowledge library.
Programming Support: Aids in code generation, debugging, and optimization.
Multilingual Support: Covers languages like Chinese, English, Japanese, Korean, French, and German.

Service workflow

Needs Assessment: Model, hardware, and software selection (1–2 weeks).
Deployment and Testing: Hardware setup and optimization (1–2 weeks, hardware procurement time excluded).
Ongoing Maintenance: Regular updates and system health checks.

Reference Configurations

Model	Hardware
DS-R1-32B	8-card inference server / NVIDIA 4-card workstation
DS-R1-70B	8-card inference server / NVIDIA 8-card inference server
DS-R1-671B	4 x 8-card inference server / NVIDIA high-end inference servers

This plan ensures robust deployment while maximizing DeepSeek's full potential. Configurations adapt to specific use cases, performance requirements, and user needs.

Finally, we gave two configurations. First, the thin-configuration version.

Component	Model
CPU	AMD EPYC 7262
DRAM	256 GB
Storage	2 x SATA SSD 1 TB
GPU	1 x 48G or 1 x 96G

Secondly, the thick-configuration version.

Component	Model
CPU	2 x Intel 6330 2G 11.2 UPI 42M 28C 205W
DRAM	16 x 32G DDR 4 3200
Storage card	1 x BCM 3008 IT SAS
Storage I	2 x 480 G 6Gb 2.5in SSD
Storage II	2 x 3.84T 480 G 6Gb 2.5in SSD
Network I	1 x 82599 dual port 10G PCIe moduleless fiber card
Network II	1 x I350 dual port 1G RJ 45 OCP 3.0 card
GPU	2 x NVIDIA L20 48GB 350W
Power supply	2 x 2000W

Software Services

Deepseek R1 70B deployment
API inferencing service integration
API integration causal reasoning integration
Smart document management (docx, txt, pdf)
Support AI semantics indicing
Support role-based access control
Support knowledge based content generation
Support code auto-completion

See all posts