Guest Post: A Balancing Act – Developer Agility vs Cloud Data Security

Mar 21, 2023 | Blogs

The cloud software development ecosystem operates in a complex and dynamic environment where identities and access are spread across the different systems and generally operates with overly permissive access with lax identity and access management practices.

Security teams struggle with balancing developer agility and least-privilege access decisions. In AWS alone there are in excess of 13,000 permissions providing access to more than 12,000 cloud services. There are tools and mechanisms in which privileged access can be restricted and granted only on a need-to-know basis (just-in-time) to cloud services and systems while maintaining a complete audit trail of actions performed. Implementing and roll-out of such technologies, whether in-built or procured through a third-party vendor, requires a complete overhaul and the context of how, when, and why engineers access cloud environments (staging or production), not to mention the cost of implementation.

What does a user really need access to?

The biggest problem with access is that most of the time users do not know what minimum set of privileges they need or what set of roles would be sufficient for them to carry out their responsibilities. Builders focus on velocity of deployments and request administrative access to eliminate friction. In the absence of a tool or a process which helps accurately identify the needs versus nice-to-haves, the problem of overly permissive access (a type of shadow access) is a disaster waiting to happen.

Hypothetical Scenario: Service Account with S3 Access Merged into Source Code

Consider a simple scenario where a service account with read-only access to all S3 buckets in the staging environment is accidentally merged into source code. The service account is used by most services as part of automated and manual testing, and the engineering team justifies its existence in code by attesting to the fact staging can only be accessed behind the VPN and the account has no access that can be leveraged by a malicious user without access to the company’s engineering VPN.

A few things to note and a few assumptions are being made in the narrative above that create security risks:

  1. While most engineering systems accessing staging environments are behind a VPN, in most cases, the source code repository itself is not blocked behind a VPN.
  2. Engineers have admin access to production and staging environments.
  3. Although infrastructure-as-code is mandated/recommended, admins have the ability to make changes to IAM and the cloud environment directly through the console for emergency situations.

Murphy’s Law In Action

The security team is woken up on Christmas Eve by a flurry of alerts by the canary token in production. The team calls an incident and starts to triage only to find a surge in anomalous activity in the production environment (successful) and is baffled even more as the account used for accessing the environment is a staging account. On further unravelling the issue, the team notices an inline policy created in the past 7 days which allowed the account to access the production environment. We’ll leave what happens next for another time. What went wrong is more relevant to this topic:

 

What really happened?

  1. An engineer was trouble-shooting an issue in production and wanted to validate an assumption before rolling out a change. In the interest of time, the engineer adds an inline policy to the staging service account to grant it production access so that his assumptions can be validated by testing them in production. The intent of the engineer was to remove the policy once the test was completed, which was forgotten soon after. This represents Shadow Access. In this case, unauthorised, invisible and unmonitored. The policy change was not caught by the security team.
  2. An intern who was part of the engineering team lost his personal laptop a few months after his internship. While he accessed the company’s engineering resources through his company laptop, he had cloned the source code repositories on his personal laptop for convenience. Here, Shadow Access risks originating from the inline policy has propagated to the intern’s laptop.
  3. The stolen laptop led to the compromise of the service account, which was then used to access the production environment and exfiltrate data and evaded detection until the canary was set off.

What went wrong?

  1. The scenario of an engineer troubleshooting is not uncommon, even in environments with strict guardrails. What is missing is the ability to track and monitor changes between staging and production environments.
  2. The stolen laptop was not the issue, the problem of shadow access is actually the security risk.

How To Balance The Risk of Data Security and Developer Agility?

DevOps and SecOps teams need to be aware that balancing security risks with developer agility starts with a few fundamental tenets.

  1. An over-permissive environment represents the biggest risk for both security and the business.
  2. Real-time visibility to access and changes in highly risky environments is critical. Visibility of who, what, when and the why has to be recorded and shared between stakeholders.
  3. Continuous monitoring and rightsizing of access is the only way we can keep track of exceptions and rightsize access.

Stay tuned, we’ll pick up both the least-privileged versus rightsizing and a data-driven approach to managing shadow access risks in our next blog.