The popular system-as-a-service still needs plenty of , expertise and organizational culture change if consumers are to actually reap the benefits it assurances. Here we confront some common challenges encountered once working with and how to prevail them.
Speaking at the Summit in The Hague, Netherlands last week, Colin Simmons, a software engineer at cloud consultancy EngineerBetter, went through two common misconceptions clients frequently bring to him, and how he has worked with them to change processes and prevail some operational challenges.
This job lets me with lots of variety of clients in lots of different and I found that no matter where I go, there’s frequently similar lack of understandings or similar interrogations being questioned to me, he informed the audience throughout a breakout session.
Misconception one: is difficult to function
The first misconception he sought to debunk was: is difficult to function.
There’s a little bit of truth to this, is very complicated, and as a complicated distributed system, you need complicated tooling to deploy it, he admitted.
is normally functioned applying an open source tool chain called BOSH, which has a legendarily steep learning curve, according to Simmons. Nowadays you have to substitute BOSH with Kubernetes, however I would argue Kubernetes is similar, at least from the operator perspective. That being noted, there exist steps that have to be taken to help.
The first main one is, if you are still deploying manually, you’re objectively undertaking it incorrect and please stop, he noted.
The second is to not cut corners on training: There’s a lot of knowledge that you require to build up to function operationally.
Establish a system team
It is also significant to establish a devoted system team, whether hiring externally or bringing together internal resources. Though, it is vital that this team doesn’t carry over all the responsibilities from their old job on top of this fresh role if it is to have any chance of success.
Organizations should try to add a devoted product manager to this team. This person is part of the teams, they go to all of the meetings in addition to strategizing in addition to retros, however their role is to constantly interface with stakeholders and clients and figure out what the priorities are and order the backlog in a way that works and ensure it all continues flowing smoothly, Simmons noted.
Treat the system as a software product
Another answer is to treat the system as a software product. If you push things without them into production, in addition to system breaks, it doesn’t matter how much you tested your apps, they’re broken too, Simmons noted. Where you have devoted teams, for deploying applications, managing applications, you should have a devoted team for managing the system.
Before joining EngineerBetter, Simmons worked for Marks and Spencer in the mobile team. We deployed to , we wrote the app ourselves, we deployed the app ourselves, we did on-call assist ourselves, we did everything, he noted.
This was actually empowering for us, because it shows you from scratch to final completion, you have to pick up an issue and then deploy the matter and observe it secured in production. Also, if you’re the person that gets paged at three in the morning, on a Saturday, because some sketchy thing pushed on Friday night, you’re not going to permit sketchy things to be pushed on Friday night.
“We had, I think, five minutes of production effecting issues in a year and a half, which we were quite proud of. From a business point of watch, they didn’t need a separate team, they just paid the developers in addition to website worked.
Once this team is established it is significant that they actually collaborate with one another.
Simmons gave an instance of one client he worked with, a European government agency that had a system team of six people distributed across the nation and only met for half an hour a week.
The consequence was there was one person that had pretty good knowledge of BOSH, pretty good knowledge of and pretty good knowledge of how their stuff worked and no one else in the team actually knew anything, he noted.
What would happen if that one person wasn’t within? Well, then all the learning that that person has done is lost in addition to system team has to struggle to relearn everything while there’s a production incident, and that’s no pleasurable for anyone, he warned.
Simmons also talks concerning the process of pairing or mobbing, where you either pair to developers within one computer and one story; or mob three to five developers within one computer and one story.
Simmons added: Before you discount this, saying this style of working will never in my industry, and will never in my firm. I’ve observed this at some of the biggest banks I recognize of, I have observed it at security firms, pharmacy firms, and automotive, you name it, you name the industry, I have to probably name someone that succeeded with pairing. So give it a shot and observe if it works for you.
He advised: Send someone from your team to pair with them on their backlog, and then invite someone from their team to come to your team and pair on your backlog. This assists foster comprehending and empathy between everyone in the trade.
Collaborating and breaking down silos are actually the key answers to nearly every problem in this talk.
Lastly, however perhaps most significantly, it is vital to establish into your processes. Normally, problems manifest as the system team has no time, or they’re falling behind on updates, or they’re falling behind on feature requests to the system, and are just in general unhappy and there exist fires everywhere,” Simmons noted. “Normally, the problem is that there’s a lack of , or there’s a lack of comprehending and existing or poor implementation.
The Foundation pushes more than releases for each month on regular. He questioned: Do you think your system team have to continue with that release cadence if you are undertaking everything manually?
What you actually require to do is change the focus of your team from operating the system to tooling to manage the system. Most of the teams I with the product we actually finish up creating is a Concourse pipeline like this that deploys for us.
Misconception: Our system is unreliable
The second lack of understanding Simmons looked to debunk was: Our system is unreliable.
is very reliable,” he stressed. “I’ve observed Cloud Foundries continue to serve apps as if nothing was going on for hours while the fundamental IaaS is kind of melting into the ground. A lot of times once a client says to me, our system is unreliable, what they actually mean is: we are seeing lots of downtime throughout upgrades and we are seeing lots of downtime once we are not changing anything. Or, the same issues keep reoccurring and we need a finish to it from trending.
The answer? system upgrades in a sandbox surrounding to abstain from disruption to key in-production systems would be a start.
Next is documenting previous incidents. If you have a problem and resolve it, and you don’t record it, you just pretend it never happened, and then the same thing happens next month, however someone else is on the call, all the learnings are lost and you have to redo it,” Simmons observed. “Every problem is a fresh problem, regardless of how many times it is happened.
Another piece of advice is to abstain from blame culture once documenting issues.
Ensure once you’re writing down what the cause is to focus on process failures as a substitute of personal failures, Simmons noted. For instance, if you have an incident that someone accidentally deletes the production database, the root cause analysis is not this person deleted the production database, the analysis is, why it is realistic for someone to accidentally delete the production database.
Simmons also noted that many clients tell him that they encounter problems because they run snowflake surroundings.
This is related to ,” he noted. “If you’re manually creating your surroundings, and or maybe you’re automating, in addition to isn’t super strict, and they kind of drift apart, then you’re in a situation where if you test an upgrade, or you test an app on pre-production, there is no guarantee at all that that’s going to once you go to production because they’re variety.
The answer to this issue is smaller releases. I’ve observed this a lot in big enterprises and normally large releases are driven by anxiety of failure, or overly strict governance on how changes have to happen, he noted. Instead, by undertaking smaller releases extra frequently it is easier to rollback once issues occur with .
You have to bluegreen deploy, you have to do creative route mapping, if something breaks, you have to switch it back actually speedily, he noted.
On the other hand, with the system, rollbacks aren’t actually a thing.
What do you do if you have to roll back? Well you have to secure forward, you have to figure out what broke, secure it and move forward with it, he noted. So how easy is this if you’ve crushed together a month of and unveiled it all at once, and you have to through, lines of YAML to figure out what broke? Well it is pretty hard, especially with all the alarm models going off. However if you do one feature at a time and release them one at a time, it becomes pretty easy.
Issues at the substructure level
The final cause Simmons detailed comes further down the stack.
“I in no distance time worked with a firm that was undertaking everything right,” he noted. “They had a devoted team, they had a product manager, actually two product managers working in cycle, and they had an ordered backlog that was continuously groomed and updated. They were writing tests for everything they could and they were undertaking from the onset. However, they kept having downtime, they kept having issues.”
The issue resided with the fundamental IaaS, where the substructure team was still operating in an old-school waterfall manual process method.
“All of their surroundings were in variety and all of these problems that I just talked about for systems was trending with the IaaS,” Simmons added. “Ultimately, the IaaS should also be treated as a system product or a software product and have its own devoted team and everything I’ve already noted.”
Simmons concluded with a quote from the Foundation CTO Chip Childers: Ultimately, you have to purchase devops .
Approaching a devops approach or a CloudOps, or a kind system-as-a-product approach, is all has to do with changing culture,” he added. “What you actually require to do, rather than going out and finding the coolest fresh technology to bring in-house, is to use time molding and improving your existing processes to actually leverage the technology.
Originally posted 2019-09-21 18:00:00.
Subscribe to our email list and follow our social media pages for regular and timely updates.
You can submit your article for free review and publication by using “PUBLISH YOUR ARTICLE” page at the MENU Buttons.
If you love this post please share it to using the social media buttons provided before the comment form.