My Notes from Software Engineering at Google: Lessons Learned from Programming Over Time
I recently read through Software Engineering At Google:Lessons Learnt over Time.I expected it to be a Google sales pitch but to my surprise it is one of the best Software Engineering books I have read.
There was some really great content in there, so I took chapter by chapter notes which you can find below. I hope they are useful to you.
Ch 1 What is software engineering?
- Software engineering is programming integrated over time.
- Software engineering is the mutliperson development of multi version programs
- Hyrums law: with a sufficient number of users of an api, it does not matter what you promise in the contract, all observable system behaviours will be depended on by somebody
- When programming, clever is a compliment. When software engineering, it’s an accusation.
- If a product person experiences outages or other problems as part of an infrastructure change but the problem wasnt surfaced by CI tests, then it is not the problem of the infrastructure change.
- If you liked it you should have put a CI test on it. The beyonce rule.
- Knowledge sharing is viral. If you have 100 engineers writing go, a single go expert answering questions will soon lead to 100 engineers writing expert go.
- The more frequently you change your infrastructure, the easier it becomes to do so.
- The earlier in the development process an issue is discovered, the cheaper it is to fix. “Shift left on security”
- Should never say " do it this way because I said so. Should trade off: financial cost, resource cost,personnel cost,transaction cost, opportunity cost and societal costs. Aim for consensus not unanimity.
- Dont underestimate keeping engineers happy. This can have huge productivity costs, in the region of 10-20%
- If there isnt data to drive a decision, there is likely still precedent, evidence and arguement
- Time doesn’t only trigger change in technical dependencies but in data used to drive decisions too - dont be afraid to change your mind.
Chapter 2 how to work well on teams
- The Genius myth is where we attribute a great feat of a team to an individual.
- The majority of work at google doesn’t need a genius, but it does need social skills.
- What will make or break your career is how you collaborate with others.
- Insecurity is super common in engineers, we dont want people to see what we have done until its finished as they may judge us (or steal it)
- The more feedback you solicit early on when working on something new, the lower the risk of the project.
- Sharing early reduces the bus factor (the amount of people who need to get hit by a bus before your project is completely doomed).
- You should have good documentation and atleast one primary and secondary owner for each area you’re responsible for.
- Working with others directly increases the collective wisdom around a problem.
- Many eyes make all bugs shallow
- Allow people to create a way they can be uninterrupted. Cuddly toys, headphones, vocally?
- Software engineering is a team endeavour
- 3 pillars of social interaction: humility, respect and trust
- Humility: you’re not the centre of the universe and are open to self improvement
- Respect: you care about those you work with. You treat them kindly and appreciaite their abilities and celebrate their accomplishments
- You believe others are competent and will do the right thing. You let them drive where appropriate.
- Relationships always outlast projects.
- Failure is an option. If you arent failing now and again , you’re not taking enough risk.
- The key to learning from your mistakes is a blameless post mortem. A good post mortem should contain a list of what was learned and what will change because of the experience
Chapter 3 knowledge sharing
- You need to create a culture of learning first, as otherwise people wont feel safe to admit they dont know something.
- You dont want knowledge island occur. This will lead to local maxima instead of global maximum.
- You dont want people to become a single point of failure.
- Every expert was once a novice. Organisations must invest in people if they want experts.
- Research has shown psychological safety (being willing to admit you dont know something in this context ) is the most important part of an effective team.
- Some good rules of engagements are:
- No feigned surprise (“what?! You don’t know that?!)
- No “well Actually” - grandstanding < precision.
- No back seating - Interrupting a discussion to offer opinions without committing to the conversation.
- No isms (its so easy my grandma could do it).
- Your coworkers are the most valuable source of information - don’t be afraid to ask for help!
- There’s always more to learn - never forget this.
- If you are not learning- you should find a new environment.
- Seniority != know everything. The more you know, the more you don’t know.
- Asking questions informs that it’s ok for others to do the same.
- Consider the principle of Chesterton’s fence before you rewrite any legacy code or make any changes. Understand why is it there in the first place!!
- Once you understand it, if you want to change it then go ahead. Either way, document your findings for future readers.
- If you learn something: WRITE IT DOWN. Then share what you wrote down with others! Future learners will likely have the same question.
- Being an expert and being kind are not the same thing. Don’t be a brilliant jerk, its toxic.
- Not all leadership/seniority is technical. Leader’s improve the quality of the people around them, create a culture of teamwork and defuse tension.
- If you want to build a culture of knowledge sharing, put your money where your mouth is and ensure that it is compensated for. It should either be part of your career ladder or have bonuses associated with it. (google allow peers to nominate others for awards).
- Static analysis tools are a great way to teach best practises.
- At Google, every PR requires readability approval, which means someone certified with the readability certification for that language has approved the change.
- The Engineering Productivity team at Google said readability had a net positive impact on engineering velocity as readable PRs were of higher quality and merged faster. Engineering satisfaction was higher.
Chapter 4 Engineering for Equity
- Whether you mean to or not, people have biases.
- An easy way to attempt to address unconscious bias is building a diverse, representative work force.
- Example given in book is google photos (at least as of 2018) was often mislabelling black people as gorillas. This was due to a failure in building a complete data set and poor testing in regards to people of colour.
- A mark of an exceptional engineer is the ability to discern how the product they are building will impact different groups of humans.
- You must dig deep and really understand your own biases - only then can you take action to consciously counter them.
- We won’t fix lack of representation simply by changing the hiring funnel - we must look inequity is promotions, retention and educational opportunity too.
- Building for the majority as the first step is a flawed methodology, it means those that are already disadvantaged are disadvantaged even more.
- Design for a user who is least like you.
- Don’t assume equity within your system -measure for it.
Chapter 5 How to Lead a Team
- At Google they split leadership into people manager’s and tech leaders.
- In experienced teams, it might be the same person.
- An engineering manager is responsible for the performance, productivity and happiness of every person in their team (including the tech lead) whilst still making sure the business needs are met.
- The Tech lead is responsible for decisions, architecture, priorities, velocity and general project management. They will work with the EM to make sure the team is well staffed.
- Most Tech Leads are still contributors, and often have to pick between doing something or delegating to a team member. Delegating is usually the correct thing to do.
- A Tech Lead Manager combines both of the above. This is rare though as its difficult to do both jobs well without burning out.
- Influence without authority is one of the most important traits either of the above can learn.
- When moving to management, it can often feel like you achieved nothing in a given day. This is because you may not have anything qualitative to show for your day’s labour - such as code.
- “Resist the urge to manage” and become a servant leader.
- A servant leader manages the technical and social health of the team.
- “Traditional Managers worry about how to get things done, great managers worry about what gets done (and trust their team to figure out how to do it).”
- As a manager you need to make your team feel psychologically safe enough to take risks and to present failure is an option. If you try and do the impossible and fail, chances are you’ll achieve so much more than if you adopted something easy, and the team growth will be huge (as long as you don’t keep failing at the same thing).
- If you do fail, run a postmortem - ensure you fail as a team and do not blame individuals.
- If an individual succeeds, praise them publicly, but if you do need to give some feedback/criticism, do it in private.
- You should always aim to hire people smarter than you that could ultimately replace you.
- You shouldn’t ignore low performers, you should address it immediately. Otherwise, you risk the whole team becoming demotivated and you end up with a whole team of low performers.
- Often low performers just need some coaching and direction. The best way to do this is to imagine you are coaching someone limping to walk then jog then run next to the rest of the team.
- It usually requires some level of micromanagement.
- Setup time with them (say 2 months) and set specific goals for that period. Meet weekly to check progress.
- Don’t ignore human issues - these are often harder to solve for than technical problems.
- Don’t try and be everyone in your team’s friend -it can be draining and is incredibly challenging. Having times where you can talk on a more personable level (i.e team lunches) is a great way to build relationships and trust though.
- Don’t compromise the hiring Bar. Steve jobs said “A people hire A people. B people hire C People”.
- If you treat your team like children/prisoners - they will act like it.
- You need to trust you team - chances are your team understand the work better than you. Drive decisions but trust their judgement. This will lead to greater ownership and accountability.
- Encourage challenge and inquiry.
- Apologise if you make a mistake - it goes a long way.
- “The Leader is always on stage”. Practise Zen, ensure you react calmly and cooly to situations as people in your team will look to you for guidance on how to act.
- Don’t always jump to a solution to people, ask questions and help them come to a solution on their own. Even if you didn’t know the answer it will give the team member the impression you did😉
- Be a catalyst - drive consensus.
- Remove roadblocks. Work to become a valuable point of escalation to solve issues.
- A good mentor is - experience with your teams processes and systems, the ability to explain things to someone else and the ability to gauge how much help someone needs (don’t explain too much).
- Set clear goals to ensure the team is working in the same direction. The easiest way to do this is to create a concise mission statement for the team.
- Be honest. This is harder than it seems as you will often receive information you cannot share. One approach to this is to say ‘I won’t like to you, but I will tell you when I cannot tell you something or if I just don’t know".
- If a team member asks you something, its completely ok to say “I know the answer but I am not at liberty to share it right now”.
- Don’t use the compliment sandwich when delivering hard feedback - just give it honestly with empathy.
- Track Happiness - make sure your team isnt working too much, keep track of all the boring work to be done and ensure its split evenly. End meetings with what do you need? in 1 x1s,ask “how would you rate your happiness on a scale of 1-10”?
- Delegate but keep your hands dirty - don’t be afraid to pick up the boring tasks.
- Seek to replace yourself - hire people smarter than you.
- Know when to make waves - don’t ignore hard problems that are affecting the rest of the team (i.e letting someone go).
- Shield your team from chaos.
- Let your team know when they do well.
- Say yes to things that are easy to undo - if someone wants to spend a couple of hours/days playing with something new that could increase efficiency, make times to let them do so.
- Extrinsic motivation works for a while but you need to work giving your team members intrinsic motivation towards your teams mission statement.
- Intrinsic motivation can be increased by giving people more autonomy, mastery and purpose.
Chapter 6 Leading at Scale (Leading multiple teams)
- The three always of leadership: Always be Deciding, Always be Leaving, Always be Scaling
- Deciding:
- Your job is about identifying and making a decision against the trade offs.
- Identify the blinders - often people who have worked in an area for a long time suffer from “its always been that way” syndrome. By identifying the blinders and asking good questions, you can come up with better solutions.
- Identify the key trade offs - ambiguous problems (which all software is) do not have a single solution. You must identify all trade offs, explain them to everyone and help them decide how to balance them.
- You now have everything you need to make a decision you can iterate on later. It might be next month you decide to change approach to the same problem, but thats ok.
- You need to ensure you frame all decisions as an iterative process to avoid analysis paralysis.
- Always Be leaving:
- “its not your job to solve an ambiguous problem but to get your organisation to solve it itself without you present”.
- This leaves you free to move onto another problem, leaving a self sufficient team in your wake.
- It helps prevent you becoming a SPOF.
- Scaling:
- The most important thing you have to defend is your time and energy
- Your job is to do the things only you can do. If someone else can do it, you shouldn’t do it.
- Regularly block out 2 hours or more to sit and quietly work on important but not urgent things - team strategy, career path collaboration guidelines.
- David Allen’s book Getting Things Done is popular with Engineering Managers and worth a read.
- Don’t be afraid to drop balls deliberately. Much better than doing it accidentally! Aim to only deal with the critical top 20% that only you can deal with
- Protecting your energy:
- Take meaningful time off (3 days + ) where you are disconnected completely from your work. You will come back brighter.
- Disconnect at the weekend too.
- Your brain only operates in 90 minutes cycles, so ensure you take 10 minute breaks regularly.
- Take mental health days. If you haven’t slept well, eaten poorly or just don’t feel up to work, don’t be afraid to declare you need a sick day and take the day off disconnected and recovering.
- Management is 95% about observation and listening, 5% about making critical adjustments in the right place.
Chapter 7 Measuring Engineering Productivity
- Measuring is expensive, we need people to track it, process it, analyse it and disseminate it.
- Before figuring out what you want to measure, you should figure out if it is worth measuring against the trade offs. You can do that with the following questions:
- What results are you expecting and why?
- If the data supports your expected results, what action will be taken?
- If we get a negative result, will appropriate action be taken?
- Who is going to decide to take action on the result, and when would they do it?
- Here is some good reasons you might get to not measure something:
- You can’t afford to change the process right now
- Any results will be invalidated by other factors.
- The results will be used as vanity metrics for something we were going to do anyway.
- The only metrics available are not precise enough to measure the problem.
- Google use Goals/Signal/Metrics (GSM) to guide metrics creation.
- A goal is a desired end result.
- A signal is how you might know you have achieved that result. They are things we’d like to measure, but may not be measurable themselves.
- A metric is a proxy for a signal. Its something we can actually measure.
- Always start with the goal, then signal, then metric. If you start with metrics, you have a risk of creating signals and goals based on what you can get easily, and not what you really want.
- Its important to maintain traceability. For each metric, we should be able trace it back to a signal.
- Goals should be written in terms of a desired property with no thought given to how you can measure it (that comes later).
- Productivity can be divided into 5 components which can be remembered with QUANTS:
- Quality of the code - Is the code quality high? Are the test cases good enough to prevent regression? How good is our tooling to mitigate risk?
- Attention from engineers - How frequently do engineers reach a state of flow? how much are they distracted by notifications? Does a tool encourage engineers to context switch?
- INtelliectual complexity - How much cognitive load is required to complete a task? What is the inherent complexity of the problem being solved? Do engineers have to deal with unnecessary complexity?
- Tempo & Velocity - How quickly can engineers accomplish their tasks? How fast can they push their releases out? How many tasks do they complete in a given timeframe?
- Satisfaction - How happy are engineers with their tools? How well does a tool meet engineers needs How satisfied are they with their work and end product? Do engineers feel burnt out?
- Signals are a way we know we have reached our goals. Not all signals are measurable, but that’s ok. There is not a 1:1 relationship between signals and goals.
- Metrics- this is where we finally measure. There are some things that are hard to measure, like code quality. We either accept that it cant be measured or we look to something more subjective (such as surveys) to do so.
Chapter 8 - Style Guides & Rules
- Style guides should be the definitive source to which Engineer’s are held accountable.
- Having a strong style guide allows engineers to concentrate on what their code needs to say rather than how they are saying it.
- Creating a style guide:
- Ask what goals are we trying to advance not what rules should we have
- don’t ask what goes into the style guide, but why does it go into the style guide?
- Rules must pull their own weight - remembering rules are hard. Not everything needs to be a rule, as some of it is implicit. For example, its rare to use the goto keyword in go, so we don’t have a rule for it.
- optimize for the reader - code will be read more than its written. For example, google does not allow turnarys as they are harder to read and ask that they use long form if statements. “no clever code”
- Be consistent - You should be able to jump across projects/ teams and be productive straight away as it should look familiar and “feel like home”. Best to be as consistent as possible with the outside world. Engineer’s will spend more time reading other Go code.
- Avoid error-prone and surprising constructs - things like reflection and “power” features of a language can often cause issues. It is best to not use them.
- Concede to practicalities when necessary - sometimes there will need to be exceptions to this rule. That is ok, but they should be made consciously and agreed with a team member. If you use an analysis tool, we should use a annotation for it.
- Style guide rules tend to be one of three things: Rules to avoid danger, rules to enforce best practise, rules to enforce consistency.
- Rules to avoid danger: guidance on how to use static members, lamba expressions, exception handling etc.
- Enforcing best practises: Aim to keep the code base healthy. Includes rules for when and where to add comments, file structure.
- Building in Consistency: Most of these are here to simply have had made a decision. Instead of engineers arguing about naming conventions or 2 v 4 spaces, they can just defer to the document and move on.
- Rules can be changed - this usually happens if a new major release to the language happens or engineers invest time to circumvent rules. You must try and justify why you need to change it. This would be the role of a “Golang chapter”. The process for change at Google involves showing examples from within google’s code base where errors have happened that would be stopped by the new rule.
- Google has style arbiters who determine which new rules to add. You must also seek exceptions from them.
- Google offer full day @google 101 classes for each language that use.
Chapter 9 Code Review
- Google use a tool called Critique to do code review.
- Code review at google looks as follows:
- User writes a change and uploads it to the code review tool.
- The user will do self review, and evaluate comments that tthe tool has made automatically.
- Reviewers then look at at the code and make some comments that require explicit resolution. Some are just for information/learning.
- The author makes changes which go back to reviewers, this is iterative.
- When a reviewer is happy, they mark it LGTM. By default, 1 LGTM is enough to merge.
- Even though approved by a team member, it may need to be approved by a CODEOWNER for a specific aspect.
- It then will also need to be approved by someone for language readability.
- Sometimes one person can be all three roles.
- A good code review should:
- Check for code correctness - can suggest alternatives that are your opinion but they shouldn’t be blockers.
- Ensure the code change is comprehensible to other engineers - the code review is the first chance for someone non bias to read it to see if it makes sense to a wider audience.
- Enforces consistency across the code base.
- Promotes teams ownership - code review enforces that code isn’t “theirs” and belongs to engineering collectively. It can also help with imposter syndrome, so ensure to praise people!
- Enables knowledge sharing - most important benefit. Make sure to include lots of FYIs and links to help people learn.
- Provides a historical record of the code review itself.
- Best practises:
- Be polite and professional always - defer to the author on approach and only point out alternatives if the author is deficient (or as an FYI). If multiple approaches are equally valid, defer to the author.
- Google expects code reviews to be completed within 24 hours.
- Write small changes - google recommends ~200 lines. 35% of changes at google are to a single file.
- Write good change descriptions.
- Keep reviewers to a minimum - multiple people will bring more opinion but has diminishing returns.
- Automate where possible - static analysis, pre commit hooks etc.
Chapter 10 - Documentation.
- Documentation gives no immediate benefit to the author, the cost of it is amortised over the long term. This can be why it is often hard to get people to write it.
- When writing documentation - always prioritise the reader. Different audiences need different things, but you should not assume your knowledge.
- Make sure your page has a singular purpose. Its ok to make more than one document.
- Good documentation should go through code review and be versioned just like code. You might want the following 3 characters to review it:
- A technical expert -for accuracy
- An audience review for clarity
- A writing review for consistency.
- When writing technical documentation, don’t jump into the how immediately and try to address the following in the first couple of paragraphs:
- Who is this for?
- What is the purpose of this document?
- When was this document created?
- Where should this document live?
- Why was this document created? What should the reader be able to do after they’ve read it?
- Good documents are complete, accurate and clear. You will rarely achieve all 3 and you need to balance them.
- Google adds metadata to documentation and notifies the owner if it hasn’t been modified in x amount of time.
Chapter 11 Testing Overview
- Testing code leads to less debugging later.
- Gives increased confidence in refactoring. Ideally you should be able to refactor without changing tests.
- Clear focused tests are the best documentation.
- Reviews are simpler.
- Google talk about tests in “small”,“medium” and “large”
- Small tests:
- must run on a single process and preferably on a single thread
- can’t perform IO operations (cant access the network or disk)
- All of this is an attempt to stop “flaky” tests and to ensure they run fast.
- Engineers are encouraged to write as many of these as possible.
- Medium tests:
- Can span multiple processes, use threads and make blocking calls.
- Can access the network but only to call something else @ localhost.
- This is where DB integration would be.
- Large tests:
- Can make network calls not just on localhost
- Google engineers tend to isolate this code from small/medium tests and only use it during build/deploy to not impact dev workflow.
- All tests should be hermetic: a test should contain all the information necessary to set up, execute and tear it down.
- All tests should assume nothing about their environments: ie, they should not care what order they are run in.
- Google aim for 80% small tests, 15% medium and 5% large (test pyramid).
- “Test everything you don’t want to break” - beyonce rule - “If you liked it you should have put a test on it”.
- Make sure you test for failure scenarios - must better it happens in a controlled environment rather than in the wild.
- Automated testing doesn’t work for everything. Humans are better at judging audio/video quality and also finding complex security issues. This is called exploratory testing.
- Exploratory testing is the process of treating an application as a puzzle to be broken.
Chapter 12 Unit Testing
- Benefits of unit tests:
- Small and deterministic.
- Can be run frequently with ease.
- Promote high level of test coverage due to their ease.
- Easy to understand when they fail because each test is focused on a single piece of the system.
- Make good documentation.
- Google aims to write 80% unit tests, 20% broader scoped tests.
- As they are so important, it’s important they are maintainable. This means they “just work”, and you don’t have to worry about them until they break. When they do break (due to a code change), they are easy to follow and you can easily fix them.
- A test shouldn’t change unless the requirements within the system change.
- There are four types of change:
- Refactoring - unless the interface changes, the tests should not need to be rewritten here.
- New features - unless the new features modify behaviour of the old features, old tests should not need to be modified. New tests, however, will need to be written.
- Bug fixes - Bugs usually represent a missing test case in the first place and should be treated like a new feature. No modification to old tests, but addition of news ones.
- Behaviour changes - This is where we would expect to have to modify old tests.
- Google don’t really like use mocking frameworks and prefer to use real objects (as long as they are fast and deterministic).
- Make your tests concise and complete.
- A test is complete when it contains all the information needed for someone to understand the test.
- Its concise when it contains nothing else apart from what is needed to be complete.
- Write tests for behaviours, not for methods - this is to test the gurantees a system should make when in a given state. These are often well framed as given, when then.
- Try not to include logic in tests. It should be trivial from a glance that it works/doesn’t and shouldn’t need any mental processing.
- Tests shouldn’t be DRY, They should be DAMP (Descriptive and Meaningful Phrases).
- Use the builder pattern so test writers only have to overwrite the vales they care about.
- Establishing exactly which test frameworks to use and standardising as possible is hugely beneficial for the team.
Chapter 13 Test Doubles
- Test Doubles are an object or function that stands in for a real implementation in a test.
- Test doubles aren’t the same as mocking - mocking are a type of test double.
- You may use An in memory database as a test double for a full SQL server, for example.
- Techniques for using test doubles:
- Faking - a lightweight implementation of the API you want to test. Using an in memory db would be an example of faking.
- Stubbing - giving behaviour to a function that otherwise wouldn’t happen. This is typically what mocking frameworks do when they do things like “WillReturn"
- Interaction testing - validate how a function is called without actually calling it. This is typically by making sure the parameters passed to it are the correct ones.
- Fakes should be used over mocks as it behaves much closer to the real implementation.
- Only use fakes if they are high fidelity- close to the actual implementation you want.
Chapter 14 Larger testing
- Larger tests may orchestrate a bunch of real dependencies and take a long time to run. Some take hours or days.
- They may be non-deterministic.
- Say you upgraded to the latest version of google maps, how would you know your upgrade worked if you didn’t test the actual integration?
- Can be used to test issues that appear under load.
- Larger tests may be less reliable, slow and more flaky. They may also not scale well.
- Prober tests can be used to “probe” production to see if its working but in a less brittle way. For example, you might check you can run a search on google.com but not check the result.
- Google do war games they called DiRT. Here they inject huge faults into their infrastructure to see how long it takes them to resolve it.
Chapter 15 Deprecation
- Code is a liability, not an asset - keeping it running, up to date and developing patches for it all takes time. This should constantly be weighed up against the trade off of deprecating it.
- Deprecation != old. The new system may have better security properties, use less resources to achieve a similar outcome.
- When designing a system, you should plan for deprecation up front. Some things to think about:
- How easy will it be for my consumers to migrate from the my product to a potential replacement?
- How can parts of the system be replaced incrementally?
- Google believes that for compulsory deprecation, the best thing to do is to build a team of experts who drop into teams and help them move onto a new system.
- “Hope is not a strategy”. If you want people to migrate from a deprecated system you need a way to alert them early deprecation is coming.
- Deprecation warnings should be actionable and relevant. Instead of saying “this function is deprecated”. It should be “fn x() is deprecated. You can migrate to fn y() by doing z”.
- Deprecations should have process owners responsible for tracking and owning deprecation. Otherwise to be blunt, it won’t get done.
- You should make milestones around deprecation. Ie traffic to the old system is down 10%, we turned off feature x.
- Google uses tooling to figure out who is using specific code they want to deprecate. They use their test tool (which everything is checked into) as a means to interrogate code and look for deprecation usage. In some cases, this tool can actually make a PR for the migration.
- Static analysis is used to stop people using deprecated code in future.
Chapter 16 Version Control & Branch Management
- I mostly skipped this chapter since it was very basic version control history and usage.
- Dev branches were used as a way of keeping code separate until it had gone through a full round of testing -this makes sense if you didn’t/don’t have automated testing.
- Google’s version control is called piper.
- Google enforce every service in production is rebuilt every 6 months or so.
Chapter 17 Code Search
- https://kythe.io <— Code searching tool crated by google
- Can be used to answer questions such as “how many people are using this API?” Or “where does the definition for this function live?"
- You can share a link to code search to allow you to do things like “have you considered using this function instead link"
- Its version controlled, so you can link to old code for post mortems.
- One of the most frequent use cases is to see how others might be doing something.
- (The rest of the chapter was mostly about how google built it - useful as reference in future but not what I wanted just now).
Chapter 18 Build System & Philosophy
- 83% of google engineers are happy with their build system and is oft quoted as a reason they love working there.
- Google’s build system is called Blaze. Googlers built and open sourced a version of it called Bazel in 2015.
- Good build systems have 2 good properties:
- They are fast.
- They are correct (given the same input, should get the same output every time)
- Older build systems are task based (Ant,Gradle,Maven). You define a bunch of steps (clean, compile, distribute).
- Engineers can pretty much define whatever they want in these files.
- The system has no context as to what is going on.
- They become very complex very quickly.
- Its hard for the system to figure out if the step was successful, so the build system becomes something else to debug.
- Task basked build systems often can’t paralelnize the work.
- Building incremental builds are slow as it rebuilds the whole code base.
- The alternative to task based build systems is Artifact based build systems. This is what Bazel is.
- The system defines a limited set of things that can br built, and a lot of the configuration is taken away from the engineer.
- Engineers say what to build, the system would say how.
- This allows for much stronger gurantees around the result.
- On Bazel specifically:
- BUILD files define targets.
- To build, you simply run Bazel build :myBinary.
- The first time you run it, bazel would do the following things:
- Parse every BUILD file in the workspace to create a graph of dependencies.
- Use the graph to build transitive dependencies ( everything that depends on your code, and everything your code depends on).
- Build the dependencies in order
- Build “myBinary” which links all those dependencies together.
- If you ran it again, you’d get an instant response as Bazel would be confident there is no work to do.
- Things like which java compiler to use are configured globally. If its changed, everything will need to be recompiled.
- Bazel uses toolchains. This is to say “given I am going to deploy onto a linux machine, this is what tools to use"
- Bazel can be extended by adding rules.
- All builds are sandboxed (using LXC on linux, same technology as docker) which means that different builds cannot interact with each other and cause problems.
- Bazel store a cryptographically verifiable file of all dependencies a piece of software uses. It will only download an external dependency from the internet if its different from the previous build, and the engineer updates the hash. If not, the build will fail.
- It does not store all dependencies locally though. You should do this yourself (mirror upstream deps).
- It makes sense to have small build targets, it means the system can rebuild smaller subsections rather than the whole project all the time.
Chapter 19 - Critique, Google’s code review Tool
- Critique emits events which Googler’s have built on top of. Some use a Chrome extension that instantly notifies you if there is code you are meant to review. Clicking the icon takes you straight to the change.
- For open source work Google use Gerrit (this is what the Go team uses)
- Mostly skipped the rest of this chapter.
Chapter 20 - Static Analysers
- Key lessons in making them work:
- Focus on developer happiness. They should be used to save developer’s time and you should actively solicit and act on feedback when engineers feel this is not the case.
- You need a low false positive rate for these tools to be effective (unless in regards to security).
- Make it part of code review.
- Mke feedback understandable,actionable and have the potential for significant impact on code quality.
- At Google, developers contribute to the static analyser rule set and can write their own analysers.
- Google’s static analyser is called Tricoder.
Chapter 21 - Dependency Management
- All else being equal, prefer source control problems over dependency management problems.
- What if you use two libraries that depend on different versions of a third library? You may need to import the third dependency twice with different versions.
- When using a library you should ask yourself:
- Does the project have tests that you can run?
- Do they pass?
- Who is providing the dependency?
- What sort of compatibility is the project aspiring to?
- How popular is the project?
- How long will we be depending on the project?
- How often does the project make breaking changes?
- How hard would it be to build ourselves?
- What incentives would we have to keep it up to date if we build it?
- How difficult will it be to roll out an update?
(Skipped last few chapters after skimming. These were CI/CD and large scale changes).