architect.dennyzhang.com
architect.dennyzhang.com copied to clipboard
:point_up: Learn system design in a solid way
- [[https://cheatsheet.dennyzhang.com/cheatsheet-systemdesign-A4][CheatSheet: System Design For Job Interview]]
- [[https://cheatsheet.dennyzhang.com/cheatsheet-leetcode-A4][CheatSheet: Leetcode Common Templates & Common Code Problems]]
- [[https://cheatsheet.dennyzhang.com/cheatsheet-behavior-A4][CheatSheet: Behavior Questions For Coder Interview]]
File me [[https://github.com/dennyzhang/architect.dennyzhang.com/issues][Issues]] or star [[https://github.com/dennyzhang/architect.dennyzhang.com][this repo]].
See more challenges from Denny: [[https://github.com/topics/denny-challenges][#denny-challenges]]
- Principles :noexport:
- [[https://en.wikipedia.org/wiki/Unix_philosophy][Unix philosophy]]: do one thing, and do it well
- Microservice: autonomy, rewrite in 2 weeks
- [#A] Key Value :noexport: | Value | Category | |--------------+---------------------------------------------------------| | Leadership | Make bigger contribution in project technial discussion | | Coordination | Know how to learn from across-teams | |--------------+---------------------------------------------------------| | Defensive | Know the pitfalls in advance | | Defensive | Work with confusing or misleading symptons | |--------------+---------------------------------------------------------| | Experience | Evaluate tools to a deeper level | | Experience | Learn new things in a faster way |
- [#A] Common Questions :noexport: | Question | Comment | |------------------------------------------------------------------------+---------| | Is current puzzle a solved problem in other tools? | | | Whether I'm trying to do is supported by the tools/products? | | | Scale: How big your scale would be? | | | Variants: What parts will be constantly changed? | | | Severity: How critical the data durability or service availability is? | | ** We may not understand the logic deeply. Usually it just works there are some implied convetions, but we have overlook them.
-
--<-------------------------- separator ------------------------>8-- :noexport:
- org-mode configuration :noexport: #+STARTUP: overview customtime noalign logdone hidestars #+DESCRIPTION: #+KEYWORDS: #+AUTHOR: Denny Zhang #+EMAIL: [email protected] #+TAGS: noexport(n) #+PRIORITIES: A D C #+OPTIONS: H:3 num:t toc:nil \n:nil @:t ::t |:t ^:t -:t f:t *:t <:t #+OPTIONS: TeX:t LaTeX:nil skip:nil d:nil todo:t pri:nil tags:not-in-toc #+EXPORT_EXCLUDE_TAGS: exclude noexport BLOG #+SEQ_TODO: TODO HALF ASSIGN | DONE BYPASS DELEGATE CANCELED DEFERRED #+LINK_UP: #+LINK_HOME:
- Contact - Architect :BLOG: :PROPERTIES: :type: life :END:
Hi there
I'm [[https://www.linkedin.com/in/dennyzhang001][Denny Zhang]]. A coder at work.
[[https://cheatsheet.dennyzhang.com/contact][https://cdn.dennyzhang.com/images/brain/denny_intro.jpg]]
[[color:#c7254e][Why I maintain this blog?]] [[https://architect.dennyzhang.com]]
As a software developer with 10 years' experience, I really want to grow myself into [[color:#c7254e][a solid architect]].
Programmers can easily pratice and measure their algorithm skills (See [[https://code.dennyzhang.com][another blog]]).
Could we do the same to the architect skills?
[[color:#c7254e][+Probably not!+]]
It depends on your project experience and your problem domains. I agree. But I would still argue. It should help, if we can organize all related resources and discussion in a good way.
Here comes this blog!
Hope you find it useful, if you're also on the journey to be a future architect. Please leave me comments for any suggestions or feedbacks.
Cheers!
See all blogs I'm actively maintaining:
| Blog | Link | |-------------------------------+-----------------------------------| | DevOps blog | https://www.dennyzhang.com | | Cheatsheet for best practices | https://cheatsheet.dennyzhang.com | | Code tests for interviews | https://code.dennyzhang.com | | Learn system design | https://architect.dennyzhang.com |
- YouTube For System Design :BLOG:Resource: :PROPERTIES: :type: systemdesign, designresource :END:
YouTube Video For System Design Learning
Please leave me comments, if you have better recommendations!
Similar Posts:
- [[https://code.dennyzhang.com/design-books][Books For System Design]]
- Tag: [[https://code.dennyzhang.com/tag/designresource][#designresource]], [[https://code.dennyzhang.com/tag/systemdesign][#systemdesign]]
YouTube: [[url-external:https://www.youtube.com/watch?v=ZgdS0EUmn70&t=11s][Intro to Architecture and Systems Design Interviews]]
My takeaway:
- Why hiring managers tend to ask vague questions in system design
- It's not about memorizing best practice. But highlight your strength.
YouTube: [[url-external:https://www.youtube.com/watch?v=PE4gwstWhmc][How We've Scaled Dropbox]]
YouTube: [[url-external:https://www.youtube.com/watch?v=-W9F__D3oY4][Scalability Harvard Web Development By David Malan]]
- Books For System Design :BLOG:Resource: :PROPERTIES: :type: systemdesign, designresource :END:
Books to learn system design
Please leave me comments, if you have better recommendations!
Similar Posts:
- Tag: [[https://code.dennyzhang.com/tag/designresource][#designresource]], [[https://code.dennyzhang.com/tag/systemdesign][#systemdesign]]
- Design data-intensive application: #+BEGIN_HTML
Please leave me comments, if you have better recommendations!
Similar Posts:
- Tag: [[https://code.dennyzhang.com/tag/designresource][#designresource]], [[https://code.dennyzhang.com/tag/systemdesign][#systemdesign]]
- GitHub Repo: [[url-external:https://github.com/donnemartin/system-design-primer][system-design-primer]]
- GitHub Repo: [[url-external:https://github.com/FreemanZhang/system-design][system-design]]
- Github Repo: [[https://gist.github.com/vasanthk/485d1c25737e8e72759f][System Design Cheatsheet by vasanthk]]
- GitHub Repo: [[url-external:https://github.com/dennyzhang/architect.dennyzhang.com][architect.dennyzhang.com by DennyZhang]]
- Examples from highscalability.com: [[url-external:http://highscalability.com/blog/category/example][here]]
- Web pages: [[url-external:https://www.hanselman.com/blog/NewInterviewQuestionsForSeniorSoftwareEngineers.aspx][link]], [[url-external:https://www.interviewbit.com/courses/system-design/topics/interview-questions/][link]], [[url-external:http://highscalability.com/blog/2009/8/7/the-canonical-cloud-architecture.html][link]], [[url-external:https://hackernoon.com/top-10-system-design-interview-questions-for-software-engineers-8561290f0444][link]], [[url-external:https://www.evernote.com/shard/s576/sh/7e58b450-1abe-43a8-bf82-fbf07f1db13c/049802174415b418a2e65f75b744ab72][link]]
- Blogs For System Design :BLOG:Resource: :PROPERTIES: :type: systemdesign, designresource :END:
Books to learn system design
Please leave me comments, if you have better recommendations!
Similar Posts:
- [[https://code.dennyzhang.com/design-books][Books For System Design]]
- Tag: [[https://code.dennyzhang.com/tag/designresource][#designresource]], [[https://code.dennyzhang.com/tag/systemdesign][#systemdesign]]
- Website: [[url-external:https://discuss.leetcode.com/tags/5/system%20design][leecode system design]]
- Blog: [[url-external:http://blog.gainlo.co/index.php/category/system-design-interview-questions/][http://blog.gainlo.co]]
- Blog: [[url-external:https://www.educative.io/collection/5668639101419520/5649050225344512][Grokking the System Design Interview]]
- [[https://code.dennyzhang.com/tag/oodesign][#oodesign]]: OO design questions in this blog
- Blog: [[url-external:http://highscalability.com][http://highscalability.com]]
https://www.careercup.com/page?pid=system-design-interview-questions http://massivetechinterview.blogspot.com/
- Papers For System Design :BLOG:Resource: :PROPERTIES: :type: systemdesign, designresource :END:
Papers For System Design
Please leave me comments, if you have better recommendations!
Similar Posts:
- Tag: [[https://code.dennyzhang.com/tag/designresource][#designresource]], [[https://code.dennyzhang.com/tag/systemdesign][#systemdesign]]
TODO
- More Resources :noexport: License: Code is licensed under [[https://www.dennyzhang.com/wp-content/mit_license.txt][MIT License]].
- Useful links #+BEGIN_EXAMPLE https://www.jiuzhang.com/qa/?channel=2 https://github.com/jrue/JavaScript-Quiz https://github.com/checkcheckzz/system-design-interview https://github.com/google/html-quiz https://github.com/imujjwal96/prelimQuiz https://github.com/energyapps/quizzer https://github.com/schoettl/regex-quiz https://github.com/MightyJoeW/JavaScript-Quiz https://github.com/rafalratajczyk/QuizJavaScript #+END_EXAMPLE
<img align="bottom"src="https://www.dennyzhang.com/wp-content/uploads/sns/github.png" alt="github" />
#+END_HTML
-
--8<-------------------------- separator ------------------------>8-- :noexport:
- Design Exercise: Marketplace System :noexport:BLOG:Project: :PROPERTIES: :type: project :END:
Design Exercise
[[color:#c7254e][Requirement:]]
Business Case:
You are building a Marketplace for Self-Employed. The marketplace allows employers to post jobs, while perspective self-employed can bid for projects. In this system, you have two actors:
- Seller: Posts a project with detailed project requirements, such as description, maximum budget and last day/time for accepting bids.
- Buyer (Self-Employed): Bids for work on a fixed price.
[[color:#c7254e][High Level Requirements]]:
-
- Design and Implement REST API to support the following requirements: #+BEGIN_EXAMPLE a. Create a Project. b. Get a Project by ID. Returned fields should include the lowest bid amount. c. API to Bid for a Project d. API to Query for all Open Projects. #+END_EXAMPLE
-
- The Buyer with the lowest bid automatically wins the bid when the deadline is reached.
-
- You are welcome to assume unspecified requirements to make it better for the customers.
-
- In-memory database is sufficient. Optionally, you are welcome to use a persistent data store of your choice.
-
- You are encouraged but not required to take advantage of a service code-generation framework of your choice when performing this exercise.
-
- [[color:#c7254e][Describe a cloud hosting plan for this service, incorporating scalability, stability, monitoring and disaster recovery.]]
-
- [[color:#c7254e][Describe an automated, continuous integration and deployment (CICD) process for production rollout.]]
Expectations:
- This is an open-ended exercise. The goal is to demonstrate how well you design a system with limited requirements
- Come prepared with high level Architecture and Design.
- You are expected to explain the rationale for your choice of technologies and architectural and design patterns.
Possible onsite extensions
- Pagination.
- Architectural changes to support 5M users.
- Resilient notification mech
- Decompose Project and Bid into two microservices: data management, communication, etc
Q: Clarity requirements and define scopes.
[[color:#c7254e][Assumptions]]:
- Normally seller may be reluctant to set the budget to be that clear. Either a range or want the providers to negotiate with them. For simplicity, we assume all projects will have a budget as a float number.
- Here we assume an easy security model. All registered buyers can check all projects and bid all projects. In the reality, sellers may want to create projects with RBAC(role based access control) enforced. Or for some projects only some levels of buyers can bid.
- Assume one can only be a seller or a buyer. If he/she want to be both, register a different count. This would simplify the whole design and implementation.
- Assume one buyer can't bid a closed project. And the compensate he/she proposes can only be no bigger than the budget.
- We assume all data can be stored in DB. Thus no data retention will be required in current stage. If they grow too big, we can move outdated data into the secondary DB. Or move the non-critical fields into NoSQL DB.
- For better consistency, we put the core data into RDMBS.
Q: Diagram of OO Design
[[image-blog:Design Exercise: Marketplace System][https://raw.githubusercontent.com/DennyZhang/images/master/design/system-oom-er.png]]
Q: Design and Implement REST API?
[[image-blog:Design Exercise: Marketplace System][https://raw.githubusercontent.com/DennyZhang/images/master/design/market_system_api.png]]
Highlights:
- All data is sent and received as JSON.
- For authorization, use OAuth2 token in header. #+BEGIN_EXAMPLE curl -H "Authorization: token OAUTH-TOKEN" https://XXX.XXX.XXX #+END_EXAMPLE
- protocol version is: 1.0 for all APIs.
- Create a Project Request: #+BEGIN_EXAMPLE POST /api/v1/projects { "name": string, "summary": string, "description": string, "budget": float, "deadline": timestamp }
$protocol_version: v1, v2, etc.
Reject very old client requests, in case of breaking API changes. #+END_EXAMPLE
- For security concern, we'd better avoid asking seller_id in the POST body.
Response: #+BEGIN_EXAMPLE HTTP/1.1 201 OK { "id": int } #+END_EXAMPLE
#+BEGIN_EXAMPLE HTTP/1.1 4XX/5XX ERROR { "message": string } #+END_EXAMPLE
- Get a Project by ID. Returned fields should include the lowest bid amount. Request: #+BEGIN_EXAMPLE GET /api/v1/projects/${id} #+END_EXAMPLE
Response: #+BEGIN_EXAMPLE HTTP/1.1 200 OK { "id": int, "summary": string, "description": string, "budget": float, "deadline": timestamp, "lowest_bid_amount": int # return -1, if no bid at all } #+END_EXAMPLE
#+BEGIN_EXAMPLE HTTP/1.1 4XX/5XX ERROR { "message": string } #+END_EXAMPLE
-
API to Bid for a Project Request: #+BEGIN_EXAMPLE POST /api/v1/projects/${id}/bid { "amount": float } #+END_EXAMPLE
-
For security concern, we'd better avoid asking buyer_id in the POST body.
Response: #+BEGIN_EXAMPLE HTTP/1.1 201 OK { "id": int } #+END_EXAMPLE
#+BEGIN_EXAMPLE HTTP/1.1 4XX/5XX ERROR { "message": string } #+END_EXAMPLE
If the project deadline is ealier than now, return 405 error.
- API to Query for all Open Projects. Request:
#+BEGIN_EXAMPLE GET /api/v1/projects?page=${page}&per_page=${per_page}
page: page numbering is 1-based
per_page: How many bid counts we want to see for each page
Sorted in ascending order. The default is 30. The valid range is [1, 400] (inclusive) #+END_EXAMPLE
Response: #+BEGIN_EXAMPLE HTTP/1.1 200 OK { "per_page": 10, "pages": 1, "page": 1, "total": 4 "projects":[ { "id": int, "summary": string, "description": string, "budget": float, "deadline": timestamp, "lowest_bid_amount": int }, { "id": int, "summary": string, "description": string, "budget": float, "deadline": timestamp, "lowest_bid_amount": int } ] } #+END_EXAMPLE
#+BEGIN_EXAMPLE HTTP/1.1 4XX/5XX ERROR { "message": string } #+END_EXAMPLE
Q: Describe a cloud hosting plan for this service, incorporating scalability, stability, monitoring and disaster recovery.
[[image-blog:Design Exercise: Marketplace System][https://raw.githubusercontent.com/DennyZhang/images/master/design/aws-cloud-basic1.png]]
Estimated cost: $244/month. (See in [[url-external:https://cloudcraft.co/app][https://cloudcraft.co/app]])
The design depends on expectations, budgets, and options we may have.
Let's assume we treat the env as [[color:#c7254e][a critical production system]]. And we want to avoid SPOF(single point of failure) and minimize the downtime.
- Choose which cloud provider?
#+BEGIN_EXAMPLE Need to choose among mature and advanced public cloud providers.
Currently AWS, Azure, GCE are the leading providers. Definitely AWS is the most versatile one.
AWS would be more expensive, compared to its competitors and on-premise ones. When our env is not that big, the difference of cost is not that big.
Hence we choose AWS for further discussion. #+END_EXAMPLE
- What about DB? #+BEGIN_EXAMPLE DB is the most critical part. It will not only impact the system availability but also data integrity.
We use AWS RDS, a hosted RMDBS service.
To avoid SPOF, add one RDS instance with another replica in a different AZ. #+END_EXAMPLE
- About DR: Incremental + full backup with S3+Glacier backend data store #+BEGIN_EXAMPLE
- Enable data incremental backup and weekly full backup. This should be fast and only generate GBs of data for medium-size system.
- Backup is stored in S3. We can keep latest 3 copies as hot backup
- The code backup dataset will be moved to Glacier automatically.
- Enforce data retention in Glacier to save cost. #+END_EXAMPLE
- About service deployment: ECS/EKS preference, EC2 is fine as well. #+BEGIN_EXAMPLE For our application: the logic is relatively simple. Most of the stateful context are saved in RDS. Here we choose container deployment over VM deployment.
ECS/Fargate can be an optional, and EKS is winning. (Note: currently AWS EKS is only in preview mode)
But before jumping into the conclusion, check with local talents. Make sure people are comfortable with container technology. #+END_EXAMPLE
About monitoring: #+BEGIN_EXAMPLE
- Enable AWS cloudwatch for infra level monitoring: disk, RAM, CPU, fd, etc.
- Enable RDS cloudwatch metrics: slow query, insane data growth
- Monitoring application log file for unexpected errors/exceptions
- Application monitoring: integrate healthcheck API
- Enable APM monitoring: It shall depends on programming languages, or work with developers.
- Redirect all alerts to slack. Critical ones to a more public channel. And non-critical to internal channels. #+END_EXAMPLE
Q: Describe an automated, continuous integration and deployment (CICD) process for production rollout.
Nowadays we typically have two standard CI workflows. #+BEGIN_EXAMPLE One is Jenkins/Bamboo/TeamCity, another set is GitLab/TravisCI/Bitbucket Pipeline.
The main difference is in the first set, we setup and maintain powerful server(s). It run lots of tests in a visualized way.
The second set is sort of serverless, or invisible to end users. Developers only need to put some yaml file. After git push, CI will work automatically.
Normally the first set is easier to setup and more intuitive. But if we're with paid plan of GitHub or Bitbucket, the second one takes less effort. #+END_EXAMPLE Here we choose Jenkins for further discussion. This gives us more freedoms with less vendor lock-in issues.
-
- Setup Jenkins service by docker. #+BEGIN_EXAMPLE If we don't have too many concurrent tests, one solo jenkins will work.
Otherwise we need to setup Jenkins master/slave agents. #+END_EXAMPLE
-
- Create Jenkins jobs to run tests. #+BEGIN_EXAMPLE Typically tests would covers below fields:
- Lint check(static check)
- Unit tests
- Deployment tests
- Functional tests
- Behavior and/or UI acceptance tests. #+END_EXAMPLE
-
- Setup the job trigger points. Either by poll or by push mechanism #+BEGIN_EXAMPLE When people git push to certain branch, we trigger tests.
With pull mechanism, we create scheduled Jenkins job to pull git commits. In this way, we don't need admin access of the git repo. No extra setup in Git server(GitHub/Bitbucket/GitLab)
With push mechanism, we need to configure the git hook in git server. Also add git server's IP to the Jenkins firewall. This is not usually that easy. The server ip may change from time to time. Thus the hook actions may fail. Or we need to allowing all public access to Jenkins.
Certainly we can enforce token authentication. But this still compromise security.
Both comes with pros and cons. Here we choose pull mechanism. #+END_EXAMPLE
-
- Define Jenkins pipeline to rollout production #+BEGIN_EXAMPLE When all jenkins tests have passed, jenkins job can trigger the deployment.
It can be fully automated. Or add some approval process.
To add approval process, we can use Jenkins pipeline input step feature.
Or define some git commit convention. Say we only monitor push to master branch. And what's more, the git message should contain patterns like "DEPLOY TO PROD". #+END_EXAMPLE
-
- One button deployment. #+BEGIN_EXAMPLE Typically we may have container deployment or VM deployment.
With container deployment, we can use less of CM(configuration management). Ask Jenkins to build and push latest docker images. Then notify prod env to pull given images and trigger deployment
With VM deployment, we might use ssh+CM tool to run deployment. #+END_EXAMPLE
-
- Online rolling upgrade #+BEGIN_EXAMPLE Nobody wants risky deployment.
With kurbernet, we have built-in rolling upgrade support.
With VM deployments, enforce healthcheck in between of node deployment. #+END_EXAMPLE
-
- Send out notifications. (Slack preferred) #+BEGIN_EXAMPLE Everybody in sync for prod env update
- Who triggers the deployment. (It could be bots or human)
- When it's updated
- How long it takes
- Whether the deployment has passed or failed
Redirect all major monitoring alerts to the same slack channel. #+END_EXAMPLE
Q: Architectural changes to support 5M users
TODO: feel like I'm talking about lots of common sense.
[[image-blog:Design Exercise: Marketplace System][https://raw.githubusercontent.com/DennyZhang/images/master/design/aws-cloud-advanced.png]]
Estimated cost: $5,750/month. (See in [[url-external:https://cloudcraft.co/app][https://cloudcraft.co/app]])
- What 5 million users mean for our capacity planning? #+BEGIN_EXAMPLE With 5M users, the visitors may be geographically located in different areas. Different regions or even different countries.
We might not have strict peak hours and non-business hours.
Let's say 10% are active users. So we have 500K active users.
Users are globally located. Let's say 50% would be at days and 50% at nights. So we assume 250K online users at average.
Apparently most activities would be readonly. Let's say every 30 seconds people perform one action. And here we assume read/write ratio is 20/1.
Then the estimation of write OPS is 396.83 per second. ((25K * (1/21))/30) And the read OPS is 7936.51 per second. #+END_EXAMPLE
#+BEGIN_EXAMPLE Let's assume active users will create 0.5 projects every month. And inactive users will 0.01 projects every month.
So we will have 295K new projects created every month. Let's say each project will generate 50KB data.
So the monthly new data would be 14.75 GB. ((295*50)/1000) #+END_EXAMPLE
-
About Data Store: separate cold data from hot data. #+BEGIN_EXAMPLE
-
Move old data into a secondary data store. e.g, projects/bids which are older than 2 years. So we can assume the live data would be 354 GB. Full DB backup and restore would take several hours.
-
Move non-critical data from RDS into a secondary K/V store. e.g, project descriptions and pictures.
-
Partition data by regions or countries. With this tenant design, DB can better scale out. Easy to manage, and also to support the QPS of 7.9K/second. #+END_EXAMPLE
-
Performance Improvements: #+BEGIN_EXAMPLE
-
Scale out Add more instances for applications.
-
Scale up Upgrade the machine flavor, if it's not too crazy.
-
Add more DB read replica(s) Since ratio of read/write is high, more db read replica(s) help. Probably we shall need no more than #+END_EXAMPLE
-
Capacity planning for DB service
From [[url-external:https://blog.takipi.com/benchmarking-aurora-vs-mysql-is-amazons-new-db-really-5x-faster/][this link]], we know 1 RDS with [[url-external:https://aws.amazon.com/rds/mysql/details/][db.r3.8xlarge]] can provide around 7000 QPS.
#+BEGIN_EXAMPLE We're expecting 396.83 write QPS, and 7936.51 read QPS.
So we can have 3 RDS(db.r3.4xlarge) to support this. 1 master, 2 slaves.
(db.r3.4xlarge: 16 vcpu, 122 GB RAM) #+END_EXAMPLE
#+BEGIN_EXAMPLE
- CloudFront(CDN) Webserver can delegate the effort of serving static files to cloudfront. Deploy Cloudfront to edges close to end users. And use latency-based DNS in AWS Route53. #+END_EXAMPLE
#+BEGIN_EXAMPLE
- AWS Redis(Caching) Load the frequent queries into redis cluster. Thus DB can be less busy. Perfect candidates of caching could be popular projects, active users, etc. #+END_EXAMPLE
#+BEGIN_EXAMPLE
-
DBA improvement for frequent DB actions Build secondary DB indices or db views. #+END_EXAMPLE
-
Avoid Region SPOF #+BEGIN_EXAMPLE
-
For serious envs like 5M users, region outage may happen sooner or later. Setup a mini and mirror system in another region. Configure cross-site async replication. It will serve as a standby system.
-
Visitors may come from US, Asian, Europe, or anywhere Geolocation deployment speed up the performance. #+END_EXAMPLE
-
About Cost Saving #+BEGIN_EXAMPLE
- Add budget monitoring and get alerts if AWS cloud bill is big
- Evaluate the vendor-lock issue(s). For large env, cost will be big if we can have only few options.
- Enable auto-scaling
- Watch service characteristic and machine flavors closely. With suitable machine flavors, we can use less infra. And it saves cost. #+END_EXAMPLE
- About DR #+BEGIN_EXAMPLE Speed up DB bakcup/restore
- Instead of sequential table-by-table backup and restore, do it on parallel.
- Perform backup when traffic is low. More traffic indicates more lockings. #+END_EXAMPLE
Q: Resilient notification mech
TODO: not sure what does this mean
- In what scenarios, we might need notification feature? #+BEGIN_EXAMPLE Notify sellers, when buyers have new bids with their projects. Conversation notification in between of individuals. Notify buyers for projects they are interested. etc. #+END_EXAMPLE
Typical requirement:
- Deliver at-most-once vs at-least-once
- Messages in order
Q: Decompose Project and Bid into two microservices: data management, communication, etc.
** misc :noexport: *** DONE After create project, see inconsistent state CLOSED: [2018-03-22 Thu 11:08] *** DONE [#A] get project id CLOSED: [2018-03-22 Thu 11:08] [2018-03-22 07:23:23 +0000] [45] [INFO] Booting worker with pid: 45 2018-03-22 07:24:47,361 - market_api.endpoints.restplus - WARNING - Traceback (most recent call last): File "/usr/local/lib/python3.4/site-packages/flask/app.py", line 1475, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.4/site-packages/flask/app.py", line 1461, in dispatch_request return self.view_functionsrule.endpoint File "/usr/local/lib/python3.4/site-packages/flask_restplus/api.py", line 313, in wrapper resp = resource(*args, **kwargs) File "/usr/local/lib/python3.4/site-packages/flask/views.py", line 84, in view return self.dispatch_request(*args, **kwargs) File "/usr/local/lib/python3.4/site-packages/flask_restplus/resource.py", line 44, in dispatch_request resp = meth(*args, **kwargs) File "/usr/local/lib/python3.4/site-packages/flask_restplus/marshalling.py", line 101, in wrapper resp = f(*args, **kwargs) File "/opt/market/market_api/endpoints/restplus.py", line 120, in post project = Project.query.filter(Project.id == id).one() File "/usr/local/lib/python3.4/site-packages/sqlalchemy/orm/query.py", line 2404, in one raise orm_exc.NoResultFound("No row was found for one()") sqlalchemy.orm.exc.NoResultFound: No row was found for one() *** TODO return error message with different type
- Design Exercise: Budget Advising :noexport:BLOG:Project: :PROPERTIES: :type: project :END:
Coding Exercise
[[color:#c7254e][Requirement:]]
I am creating my budget for the next month. Besides regular spending, I also added a list of extra items I want to buy. I added my budget amount and realized that it has exceeded my planned spending amount, so I want to eliminate some items in order to cut down my budget.
You are a developer at Mint. In order to help me manage my personal finance better, you are giving me suggestions of what items I should remove from my budget. What you are given is:
- A list of extra items I want to buy. Each item has a name and an amount. (Ex. Name: "Backpack", amount: 50.00). There are no duplicate items.
- My current total budget amount for next month: n dollars.
- My target total budget amount for next month: m dollars. (m < n)
#+BEGIN_EXAMPLE
Ex. - Name: "Backpack", amount: $55.00
- Name: "Monitor", amount: $100.00
- Name: "Water bottle", amount: $10.00
- Name: "Tent", amount: $150.00
- Name: "Headphone", amount: $123.00
current total budget: $1200.00 target total budget: $1000.00
returning pair: "Backpack", "Tent" #+END_EXAMPLE
If I only want to remove 2 items to lower my budget to target budget, is it possible? If so, which 2 items should I remove?
Q: How to get the biggest number which is smaller than the target, after removing no more than 2 items?
[[color:#c7254e][Clarification/Assumptions]]:
- If the sum is smaller than target, remove nothing.
- If multiple choices, any one would be acceptable.
- If remove one item can make the sum smaller than target, and make the sum biggest, just remove one.
- If remove the 2 biggest items still don't work, return an empty list.
#+BEGIN_SRC python #!/usr/bin/env python3
Complexity: Time O(n*log(n)), Space O(n)
class Solution(object): def budgetAdvising2Items(self, items, prices, target): """ :type items: List[string] :type prices: List[float] :type target: float :rtype: List[str] """ total = sum(prices) # no need to remove items if total <= target: return [] diff = total-target
# sort the list
l = sorted(zip(prices, items))
res = []
min_remove = total
# only need to remove one item
for (price, item) in l:
if price == total-target: return [item]
if price > total-target:
if price < min_remove:
min_remove = price
res = [item]
break
# if removing any two items won't work, we return []
if l[-1][0] + l[-2][0] < total-target: return []
# need to remove two items
left, right = 0, len(l)-1
while left<right:
v = l[left][0] + l[right][0]
if v == total-target:
return [l[left][1], l[right][1]]
if v < total-target:
left += 1
else:
# evaluate the candidate
if v < min_remove:
min_remove = v
res = [l[left][1], l[right][1]]
right -= 1
return sorted(res)
#+END_SRC
Q: How do you want to test your code?
-
- Design testcases for normal cases #+BEGIN_EXAMPLE Normal case with 5 items
Normal cases with huge records, say 100+ items. This may happen for SMB. But it's unlikely that we have tens of thousands of records in this scenario.
Target is bigger than the total
Removing one item instead of two would be the best choice #+END_EXAMPLE
-
- Design testcases for invalid input #+BEGIN_EXAMPLE The list is empty
The counts of of items and prices are not the same.
Some prices are not valid positive float
Duplicate names in the items #+END_EXAMPLE
-
- Enable code check for git push hook. #+BEGIN_EXAMPLE Static lint tests Unit tests #+END_EXAMPLE
Q: What changes you want to make, in order to get your code ready for production?
-
Define exceptions, and throw exceptions for unexpected input or errors. #+BEGIN_EXAMPLE Thus the caller won't get false positive #+END_EXAMPLE
-
Provide lint checks and unit tests for integration. #+BEGIN_EXAMPLE As the code keeps changing, we might bring in regression issues. Unit tests can help. #+END_EXAMPLE
-
Add logging for critical errors.
#+BEGIN_EXAMPLE If any unexpected errors or exceptions have happened, write critical log. Based on that, we can get proper notification via ELK, or even Slack messages. #+END_EXAMPLE
- Provide REST API for people to integrate the function.
#+BEGIN_EXAMPLE People can design end-to-end tests based on the REST API. Monitoring can also be built on top of this. This helps maintenance. #+END_EXAMPLE
-
If you use the functionality as a service, wrap up the solution as a microservice or a [[color:#c7254e][container]]. #+BEGIN_EXAMPLE Much easier to deploy and maintain. Easy to scale, and more reliable. #+END_EXAMPLE
-
Add event notification for business requirements. #+BEGIN_EXAMPLE We might want to do data mining to know more about our customers. Say how often the individuals may run out of budget, by what ratios.
Thus we can send out notifications to another data store or a queue for off-line data analysis. #+END_EXAMPLE
- Do we need to support family shared accounts? If so, we might encounter concurrent writes. #+BEGIN_EXAMPLE Let's say we need to support that. The husband has added many items, which leads to out of budget. When our application try to give suggestions, the wife has deleted some items.
This means our suggestions might be out-of-date. It could be misleading or confusing.
So how we can solve this? (Note: this is very unlikely to happen). #+END_EXAMPLE
Though we might have coflicts, but they are unlikely to happen. #+BEGIN_EXAMPLE
-
So we simply add a validation check, when we propose the suggestions. If the items have changed, we discard our suggestions. Sort of CAS(Compare-And-Set) logic.
-
Or use optimistic locking.
-
Or use lock-free model. The program is a worker thread with its own queue. #+END_EXAMPLE
Q: What if I want to remove 3 items, if there are no 2 items that satisfy the requirement?
[[color:#c7254e][Clarification/Assumptions]]:
- If we have better solutions to remove less then 3 items, remove that one.
- If we have multiple solutions, return any one would be acceptable.
- If we remove 3 largest items and it still doesn't work, return an empty list. #+BEGIN_SRC python #!/usr/bin/env python3
Description :
Basic Ideas: Sort the list. Then use two pointers
Complexity: Time O(n*n), Space O(n)
class Solution(object): def budgetAdvising3Items(self, items, prices, target): """ :type items: List[string] :type prices: List[float] :type target: float :rtype: List[str] """ total = sum(prices) # no need to remove items if total <= target: return []
# sort the list
l = sorted(zip(prices, items))
# remove one or two items
res = self.budgetAdvising2Items(items, prices, target)
if res != []:
min_remove = 0
for item in res:
for x in l:
if x[1] == item:
min_remove += x[0]
break
else:
min_remove = total
for i in range(len(l)-2):
left, right = i+1, len(l) - 1
while left < right:
v = l[i][0] + l[left][0] + l[right][0]
if v == total-target:
return sorted([l[i][1], l[left][1], l[right][1]])
if v < total-target:
# need bigger items
left += 1
else:
if v < min_remove:
min_remove = v
res = [l[i][1], l[left][1], l[right][1]]
# need smaller items
right -= 1
return sorted(res)
def budgetAdvising2Items(self, items, prices, target):
"""
:type items: List[string]
:type prices: List[float]
:type target: float
:rtype: List[str]
"""
total = sum(prices)
# no need to remove items
if total <= target: return []
diff = total-target
# sort the list
l = sorted(zip(prices, items))
res = []
min_remove = total
# only need to remove one item
for (price, item) in l:
if price == total-target: return [item]
if price > total-target:
if price < min_remove:
min_remove = price
res = [item]
break
# if removing any two items won't work, we return []
if l[-1][0] + l[-2][0] < total-target: return []
# need to remove two items
left, right = 0, len(l)-1
while left<right:
v = l[left][0] + l[right][0]
if v == total-target:
return [l[left][1], l[right][1]]
if v < total-target:
left += 1
else:
# evaluate the candidate
if v < min_remove:
min_remove = v
res = [l[left][1], l[right][1]]
right -= 1
return sorted(res)
#+END_SRC
Q: What if I want to remove K items?
[[color:#c7254e][Clarification/Assumptions]]:
- If we have multiple solutions, return any one would be acceptable.
- If we have better solutions to remove less then K items, we still choose K items #+BEGIN_SRC python #!/usr/bin/env python3
Basic Ideas:
Sort the list. Then use the idea of two pointers
Complexity: Time O(pow(n, K-1)), if K>=3.
Time O(n*log(n)), if K == 2
Time O(n), if K == 1
Time O(1), if K == 0
Space O(n)
class Solution(object): def budgetAdvisingKItems(self, items, prices, target, K): """ :type items: List[string] :type prices: List[float] :type target: float :type K: int :rtype: List[str] """ total = sum(prices) # no need to remove items if total <= target: return []
if K <= 0: return []
if K >= len(items): return sorted(items)
if K == 1:
# linear check
res = []
min_remove = total
for i in range(len(items)):
# need bigger item
if prices[i] < target - total: continue
if prices[i] == target - total: return [items[i]]
# find a better candidate
if prices[i] < min_remove:
min_remove = prices[i]
res = [items[i]]
return res
# sort the list
l = sorted(zip(prices, items))
index_list = self.myBudgetAdvisingKItems(l, total-target, K, 0)
return sorted([l[i][1] for i in index_list])
def myBudgetAdvisingKItems(self, l, offset, K, start_index):
"""
:type l: List[(string, float)]
:type offset: float
:type K: int
:type start_index: int
:rtype: List[int]
"""
assert(K>=2)
if start_index == len(l): return []
total = sum([l[i][0] for i in range(start_index, len(l))])
res, min_remove = [], total
if K == 2:
left, right = start_index, len(l)-1
while left<right:
v = l[left][0] + l[right][0]
if v == offset:
return [left, right]
if v < offset:
# too small
left += 1
else:
# evaluate the candidate
if v < min_remove:
min_remove, res = v, [left, right]
right -= 1
return res
# K>=3
for i in range(start_index, len(l)-1):
if l[i][0] >= offset: continue
index_list = self.myBudgetAdvisingKItems(l, offset-l[i][0], K-1, i+1)
if index_list != []:
index_list = [i] + [k for k in index_list]
sum_removed = sum([l[k][0] for k in index_list])
if sum_removed < min_remove:
min_remove, res = sum_removed, index_list
return res
#+END_SRC
Q: I don't have number of item limit, show me all the possible combinations of items I can remove to lower my budget.
[[color:#c7254e][Clarification/Assumptions]]:
- Show all combinations with the optimal values.
- Not showing all combinations whose sum is no bigger than the budget. If we remove everyting, it could work. But it's not what we want.
#+BEGIN_SRC python #!/usr/bin/env python3
Description :
Basic Ideas: Sort the list. Then BFS
The worst case: the budget is so low that we have to remove almost all items
Complexity: Time O(pow(2, n))
Space O(pow(2, n))
import sys class Solution(object): def budgetAdvisingItems(self, items, prices, target): """ :type items: List[string] :type prices: List[float] :type target: float :rtype: List[str] """ import collections if len(items) == 0: return [] total = sum(prices) # no need to remove items if total <= target: return []
min_diff, res = total, []
l = sorted(zip(prices, items))
queue = collections.deque([([], total-target)])
for i in range(len(l)):
(price, item) = l[i]
for j in range(len(queue)):
(item_list, diff) = queue.popleft()
# get the neighbors
# don't select current item
queue.append((item_list, diff))
# select current item
if price < diff:
queue.append((item_list+[item], diff-price))
else:
# we get candidates
if (price-diff) == min_diff: res.append(item_list+[item])
if (price-diff) < min_diff:
res, min_diff = [item_list+[item]], (price-diff)
return res
#+END_SRC
- TODO todelete :noexport: ** Contact - Architect :BLOG:Resource: :PROPERTIES: :type: life :END:
YouTube Video For System Design Learning
Please leave me comments, if you have better recommendations!
Similar Posts:
- [[https://code.dennyzhang.com/design-books][Books For System Design]]
- Tag: [[https://code.dennyzhang.com/tag/designresource][#designresource]], [[https://code.dennyzhang.com/tag/systemdesign][#systemdesign]]
YouTube: [[url-external:https://www.youtube.com/watch?v=ZgdS0EUmn70&t=11s][Intro to Architecture and Systems Design Interviews]]
My takeaway:
- Why hiring managers tend to ask vague questions in system design
- It's not about memorizing best practice. But highlight your strength.
YouTube: [[url-external:https://www.youtube.com/watch?v=PE4gwstWhmc][How We've Scaled Dropbox]]
YouTube: [[url-external:https://www.youtube.com/watch?v=-W9F__D3oY4][Scalability Harvard Web Development By David Malan]]
-
--8<-------------------------- separator ------------------------>8-- :noexport:
- TODO Role model :noexport: ** TODO https://github.com/MindorksOpenSource/android-interview-questions ** https://github.com/ScalableSystemDesign
- TODO Not-to-do list :noexport:
- As architects, we need to worry much less about what happens inside the zone than what happens between the zones.
- TODO What trade-off we haves :noexport:
- TODO 2nd adsense doesn't show up: https://architect.dennyzhang.com/ :noexport:
- TODO Blog: Consensus Algorithm For A Replicated Log :noexport:
- raft(replicated and fault tolerant)
https://raft.github.io/ https://raft.github.io/slides/uiuc2016.pdf
- TODO Differente in between typical k/v db: redis, etcd, dynamo, mongodb, couchbase, elasticsearch, cassandra :noexport: http://techgenix.com/dynamodb-vs-mongodb/
- TODO At a high level: spec information flows "down" and status information flows "up". :noexport:
-
--8<-------------------------- separator ------------------------>8-- :noexport:
- TODO https://github.com/0voice/interview_internal_reference :noexport: