The Federal Government is looking to analyze and use the volume of data it collects, creates, and stores to improve service delivery and operational efficiencies.  Whereas this strategy is conceptually appealing, the aggregation, storage, management, sharing, and analysis of this accumulated “big data” come with its own challenges.

We recently had the opportunity to sit down with Rob Jaeger, a Systems Engineering Manager with Juniper Networks, to discuss the technology considerations that federal agencies need to evaluate before they undertake big data initiatives.  Here is what Rob had to say:

The Modern Network: Exploitation of big data is increasingly popular across Federal agencies.  What impact does aggregating, storing, managing, and sharing this data have on government networks and data centers? More specifically, what data center technologies can facilitate efforts to unlock the potential of big data analysis? 

Rob Jaeger: Data centers serve a range of essential functions within and across diverse Federal agency operations.

The 2010 Federal Data Center Consolidation Initiative (FDCCI) was launched to motivate government agencies of all sizes to eliminate unnecessary, underutilized, and inefficient computing resources, and where possible, to undertake thoughtful IT infrastructure consolidation. It was anticipated that fewer data centers would result in an overall improvement in agency security postures and their individual and collective capabilities for incident response.

In parallel, the Federal Government began broader adoption of cloud computing alternatives that provide for dynamic service delivery over networks from an abstract set of resources to meet specific agency computing requirements.  These hosted resources are scalable and available on demand, quite transparently to end users.

To be most effective and accommodate a variety of applications, this enterprise shift to cloud computing must leverage several types of technology resources–computing, storage, and networking—to enable optimized, dynamic allocation in an automated, orchestrated, and logically-diversified environment.

Using orchestration, resources can be pooled within and across multiple data centers to provide an environment that responds dynamically to user needs.

The Modern Network: What technologies are most useful for data analysis given the volume, variety, and velocity of information that agencies manage?  What is the impact of this data analysis—if any–on networks and data centers?

Rob Jaeger: Analysis tools vary depending upon the nature and structure of the data.  There are traditional data warehouses for structured data which can be accessed by a variety of standard or customized tools that have been and will remain an important tool for many enterprises.

As the big data discussion has advanced, innovative approaches now provide for storage of unstructured data where architectures like MAP-Reduce (Hadoop clusters) and expanded cloud computing applications have changed the way enterprise data may be indexed, stored, and searched.  The nature of the network traffic in a computing ecosystem that is processing big data be intense, causing incast issues in large, intense search environments.

The network architecture for these systems can directly influence the performance of systems designed for big data collection and analysis. One essential component for an effective virtual computing infrastructure is the underlying Data Center Fabric that provides high-performance, any-to-any connectivity, and is capable of flattening the entire data center network into a single tier where all access points are equal. This approach can be leveraged to eliminate the effects of network locality and enables an ideal network foundation for cloud-ready, virtualized data centers.

Among the key characteristics of an effective virtualized data center are a highly scalable network that improves application performance with low latency and converged services in a non-blocking, lossless architecture that supports Layer 2, Layer 3, and Fibre Channel over Ethernet (FCoE) capabilities.

Lossless queuing in the fabric provides benefits to both traditional data warehouses and cloud services, and even Network-Attached Storage (NAS) and Hadoop environments.  For example, in a joint effort to develop a high-performance, distributed filer with Panasas, a provider of scale-out storage solutions for big data workloads, Juniper Networks found that the incast killed performance without lossless queuing. However, by enabling lossless queuing in tandem with other tuning, the fabric was able to move data at the maximum capacity of the filer.

The Modern Network: With all of this sensitive citizen data being gathered by government agencies, it raises important questions about information security and individual privacy. Is the aggregation of citizen data creating “honey pots” for hackers and other malicious actors? What steps should agencies take to protect this information?

Rob Jaeger:
Although one may be concerned that data aggregation is creating vulnerable honey pots, it is important to understand that even if this personally-identifiable information (PII) it was not aggregated, it still needs to be protected.

Among the steps agency professionals can take are:

- Keeping PII and other confidential data encrypted whether it is at rest or in motion.

- Implementing Network Access Control (NAC). NAC adds another layer of security that limits information access to authorized personnel and enforces encryption between the requesting endpoint and the enforcement point in the data center or cloud environment. NAC also can help identify or stop data leakage by monitoring the flash drive activity on personal computers connected to a network.

- Utilizing a multi-layered security strategy. Overall, the best security strategy should consist of layers designed to protect the data without becoming barrier for authorized users.

The Modern Network: What about insider threats? Are solutions, such as NAC, capable of preventing government employees from accessing constituent data that they shouldn’t?

Rob Jaeger: The best way to keep unauthorized employees from accessing PII is data encryption while it is at rest or in motion with provisions for data decryption only for processing and reporting. In addition, NAC, intrusion detection, intrusion deception, and stateful firewalls should be a part of the total security program.

The agency’s overall security strategy should balance security risks with the cost of exposure. For the most critical components, advanced authentication, authorization, and audit models with alerting—coupled with correlation of incidents and behavior–should be applied. This advanced model should have several warning methods in place to immediately alert network administrators when a security threshold has been reached.

By combining these security solutions and strategies, an agency should be able to mitigate risks and protect data more effectively against external and internal threats.