CMU的17-637 Web Application Development介绍了一些网站开发的基础知识,作为入门课程比较全面。以下对自己期末复习时的笔记进行总结整理,以便日后查询。

1. What is HTTP? What problem(s) does it solve? Describe the contents/parts of an HTTP request and response. Know the differences between the various HTTP methods and when each method should be used. Know the parts of a URL.

HTTP, Hypertext Transfer Protocol.

2. Understand the basic principles of internet networking. What is a MAC address? An IP address? A DNS hostname? Describe the process, in detail, of sending a request from a client computer to a web server and retrieving a response.

MAC address - Media Access Control - unique indentifier of a device assigned to a network interface controller (NIC) for communications at the Data Link Layer of a network segment.

IP address - Internet Protocol address - a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communications. - Host/Network Interface identifier + location addressing

DNS hostname - (domain name) a label assigned to a device connection to a computer network; it’s used to identify the device in communications. They may be simple names consisting of words or phrases, or they may be structured.

3. Understand the basic functions of a web server for both static and dynamic (i.e., generated from web applications) content. For dynamic content, understand the typical division of functionality between the web server, the application server, and the database system.

Web server - whom the client talks to

Application server - who inserts data into templates.

4. What is HTML? What advantages does it have over alternative representations of HTML-like data? Know how to write basic HTML and compare the style and function of relatively sophisticated HTML. Understand the use of the HTML <div> and <span> tags.

Adv: everyone knows html. it can be interpreted easily.

5. What is CSS? What advantages does CSS offer over including style information directly into the HTML document? Know how to write basic CSS rules.

Cascading Style Sheets.

separation of concerns. reusable. easy to control. clearer code.

6. Understand basic principles of CSS libraries such as Twitter Bootstrap. What problems do CSS libraries solve? Know at a high-level how Bootstrap works.

7. What are HTML hidden fields? Cookies? Sessions? What advantages does each method have over the others? For each method give an example of when that method is clearly better than the others. Know how sessions are typically implemented.

Hidden fields - Kept hidden from user. will be gone once I leave.

Cookies - on client side. Server has two info about me: cookies and session. cookies will be given to me, and for me to carry around. server uses cookies info to match me to my session. If not using https, cookies can be seen by everyone; no matter GET or POST.

Sessions - on server side. tracking my state. (so I dont need to login many times)

8. What is Model View Controller? Understand the responsibilities of each component of the MVC architecture. What advantages does MVC have over unstructured web applications? Why is MVC particularly well-suited to web applications? Given an MVC implementation (such as the Django framework) what are the disadvantages of that MVC design?

model - data manipulation

view - representational

controller - business logic

separation of concerns. better for maintainess! (e.g. outsourcing to frontend team)

9. Know many in-depth details of Python and the Django framework, including project configuration, URL dispatch, actions, Models, the ORM query API, templates and the template language, static content, Forms, ModelForms, request and response features such as parameters, cookies, sessions, content types, and HTTP header manipulation, reverse URL resolution, transactions, and the built-in authentication system. Understand the data interaction of a typical Django application. If given sufficient time, you should be able to design and write reasonable pseudocode for a complete Django application, end-to-end, using all of the features above. You should also be able to describe, in prose and in reasonable pseudocode, how features of the Django framework are implemented.

10. What is a MIME type? Know how to detect and set the MIME type for HTTP requests and responses, both for the HTTP protocol itself and within Django.

11. What is a multi-part HTML form? Understand high-level details of how multi-part form data is transmitted in an HTTP request and how multi-part form data can be read within a Django application.

12. What is JavaScript? What advantages are there to using JavaScript to manipulate a web page rather than sending all requests to the web server? What problems are there when using JavaScript to manipulate a page? What factors in modern browsers can make writing JavaScript challenging?

13. What is the Document Object Model? Know basic DOM traversal and manipulation using either the standard JavaScript APIs or jQuery.

14. What is Ajax? Describe, in detail, the technologies and operations used to make an Ajax application. What are the advantages and disadvantages of using Ajax as compared to synchronous requests? What are some advantages and disadvantages of using Ajax as compared to Adobe Flash, Java Applets, or other interactive client-side runtime environments? Be able to write pseudocode for simple Ajax applications using either the standard JavaScript APIs or jQuery.

Ajax: combination of techs that allows a client-side web app to asyn rqeuest a web server, while providing an interactive UX within a web page. (client makes requests to server.)

15. What is jQuery? What problems does it solve? Understand jQuery selectors well enough to write jQuery pseudocode to traverse and manipulate DOM elements and generate and process Ajax requests.

16. What are the advantages of writing an application as a web application, as opposed to a client-side application installed on a user’s computer?

17. What advantages does a server-side web application have over a client-side web application?

server-side: (Django is) It’ll have full control of validation, authentication, etc. Safe! because user can’t see or modify your code.

client-side: browser’ll do some computation. May be very slow if client doesnt have a good machine. But server’ll be light loaded.

18. What is a race condition? What does it mean for code to be “thread-safe”?

19. Understand the advantages of encapsulating data management logic as methods of the data classes (the model), compared to embedding the data management directly into the application’s logic (the actions).

20. What advantages does a database offer over storing data directly in the filesystem? The filesystem over using a database?

DB: security. consistency. somewhat structured (e.g. define max_length).

FS: no security. no consistency promise.

21. What is a B-tree? What is a heap file? What advantages and disadvantages does a B-tree have over other storage methods?

binary tree.

Adv: faster searching. Dis: takes time to find right place to insert/delete

Heap file = unordered set of records.

Quick insertion (to the end). Slow search.

22. Understand the basic structure of a relational database and the relational model.

23. What is SQL? Be able to write basic SQL statements including simple SELECT, INSERT, UPDATE statements.

24. What is a SQL join query? Be able to write basic SQL join queries.

</b>25. Within a database, what is a primary key? What is a secondary index? What are the advantages and disadvantages of keeping a large number of secondary indexes on a relation?

secondary index basically equals to creating another table with this secondary key as PK.

Adv: faster querying.

Dis: slow writing - cuz need to ensure consistency and constraints.

26. What is a transaction? Understand basic terminology related to transactions (begin, commit, abort/rollback, etc).

27. What is serializability? Understand the basic ideas of transaction isolation levels.


Copyright Microsoft Docs Transaction Isolation Levels - SQL Server. https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/transaction-isolation-levels?view=sql-server-2017

28. What are the ACID properties? Understand how these properties are implemented in a modern database system. (See 31)

29. Understand the relative costs of interacting with a typical database system, such as the cost of initially connecting to a database, the cost of executing a simple query on a database, and the cost of executing a simple update on a database.

Initial connection - probably most expensive. need to ping the server, authentication, establish connection

Query - not changing the content! -> could be faster than updating

Update - changing content -> must maintain consistency, constraints, etc. -> slower

But real costs depend on what the query/update looks like.

30. You should understand some basic principles of relational database design, to be able to compare and contrast two different data layouts proposed for a data design problem.

31. For each of the ACID properties, name a technology or technique commonly used to guarantee that property within a database system.

Atomic - all or nothing - @transaction, @atomic, write-ahead logging

Consistent - database invariants preserved; always valid state - database constraint (pk, fk, check, cascades, etc), semantics checking

Isolated - like executed sequentially - locking (two-phase locking)

Durable - remain committed even system failure (crash/poweroff) - write-ahead logging (if node dies, use logs to rebuild new node), write to persistent disk

32. What is write-ahead logging? Understand the sequence of operations for transactional reads and writes with write-ahead logging, as well as how the log is used to recover the correct database state during system crashes or transaction rollbacks. Explain how write-ahead logging can increase the performance of a transactional system even though it increases the number of writes and amount of data written.

33. What is two-phase locking? Explain the sequence of operations for transactional reads and writes with two-phase locking. How is two-phase locking typically enforced?

34. What is deadlock? Describe two methods to avoid deadlock when two-phase locking is used.

35. Understand fundamental aspects of object-relational mapping tools. What advantages and disadvantages are offered by an object-relational mapping tool over directly writing the SQL? Be able to convert specific Django query API calls into SQL statements and vice-versa, for simple SQL statements such as the kind you are expected to know above.

36. What is unit testing? Understand how to use basic features of the unit testing framework within Django.

37. What problems can be caused by a user embedding HTML tags in an input field? Understand what features of the Django framework usually prevent HTML injection attacks.

38. What is a Cross Site Request Forgery (CSRF) attack? How does the Django framework help prevent CSRF attacks?

CSRF attack - 登录某网站A,本地生成了Cookie。在不登出A的情况下,又访问了B网站(携带攻击代码)。由于我已登录(通过验证),A会直接接受由我浏览器发出的请求(实际可能是B发的)。

How Django prevents?

I sends a POST to django.

django will assign a new one-time token for me to carry back.

next time I want to query django, i should use this new token.

Each time, i’ll have a new token. => prevent CSRF attacks.

39. What problems can be caused by a user embedding SQL syntax characters in an input field? Understand what features of the Django framework usually prevent SQL injection attacks, and what features of the Django framework require you to manually prevent SQL injection attacks.

Problem - SQL injection.

Feature - Django Form validation

Manually - If not using Form, you’ll have to manually protect your code. (e.g. directly take POST params)

40. What is a one-way function? Describe how one-way hash functions can be used to provide security for data.

41. What is a salt? Explain how salts can be used to prevent rainbow table attacks (a.k.a. dictionary attacks) on a password database.

randomness appended to the original content.

42. What is network sniffing? Spoofing?

43. Understand the details of public key cryptography, including public and private keys, encryption, and digital signatures.

44. Understand the details of private key cryptography. What are the advantages and disadvantages of public key vs. private key cryptography. How does a hybrid cryptosystem address the shortcomings of public and private key cryptography?

private - systematic

public key crypto - asys

hybrid - use pub to generate priv

45. What is SSL/TLS? HTTPS? What properties does SSL/TLS provide? Understand, in detail, how SSL/TLS works and each step of the SSL/TLS protocol.

ssl - secure socket layer.

tls - transport layer security (improved based upon SSL)

https - http over ssl

property? crypt protocols. End-to-end secure transport over network.

(Prevent eavesdropping, tampering (modify), message forgery (pretend).)

step?

SSL/TLS:用asyn key加密session key,用session key加密每次session。

四次握手


Copyright IBM Corporation 1999, 2014Copyright IBM Corporation 1999, 2014. https://www.ibm.com/support/knowledgecenter/SSFKSJ_7.1.0/com.ibm.mq.doc/sy10660.htm sy10660

  1. client->server: client hello. “i want to use these cipher suites and ssl/tls versions”
  2. server->client: server hello. “lets use this cipher suite and this version. here’s my certificate - including my Kpub”
  3. client->server: {pre-master key}Kserver-pub
  4. server: read pre-master key.
  5. client & server: use 约定好的算法 + pre-master key to compute a Kshared.
  6. client->server: finish. {“From now on, i’ll use our new Kshared to encrypt.”}

46. What is an X.509 certificate? What is the most relevant information it contains, and how is this information used to guarantee the authenticity of a certificate?

47. Understand the strengths and weaknesses of SSL/TLS for web security. How can its security be compromised?

Dis: cost of certificate, maxed mode(some http file under https website will cause warning message in browser), proxy caching, not mobile-friendly.

48. What is a Certificate Authority, and what function do they provide? How does a web client verify the authenticity of a Certificate Authority? Understand the idea of a chain of trust and Public Key Infrastructure.

Sign a certificate. => the public key in this certificate belongs to the entity named in this document, signed by [CA].

49. Describe distinct methods to distribute user requests evenly among a collection of static web servers. For each method, what are the advantage and disadvantage of that method as compared to the other methods?

load balancer:

Dis: single point of failure

50. What are the advantages and disadvantages of static web caching as compared to distributing HTTP requests among a collection of static web servers at the content provider?

51. What is a Content Distribution Network (CDN)? What are the advantages and disadvantages of using a CDN as opposed to a traditional collection of web servers and ad-hoc static web caches?

CDN: content will be served to edge servers.

It causes http requests to be routed to a nearby server by customizing DNS hostname resolution for each client ip!

It maintains a DB/map of {client IP -> server} to select the best web server replica for each IP, and resolves DNS requests for client, route client to the best server replica.

Adv: faster serving. No need to retrieve from server. Dis: not dynamic. rely on cache. if your content changes a lot, dont use CDN. Price is high.

52. Why is scaling web applications harder than scaling static web content? Describe one common method to distribute requests evenly among a collection of web and application servers. What is the problem with this common method?

static web content: is static. just replicate content.

scaling web app: also need to scale db, etc

Common method? load balancing.

Problem? single point of failure.

53. Understand database partitioning, and how it can increase the performance of a database server. What problems are caused by partitioning a database?

54. What is database replication, and how does it increase database scalability? What problem does replication have?

55. What is consistent hashing? Understand how consistent hashing can be implemented by partitioning a ring-like hash space into buckets that store contiguous ranges of hash values.

每个值m都会被modular by 360. => consistent hash value

每个server会负责一定的range, hash value落在这个range里,就归这个server管。server增减时也只改变mapping关系。

56. What is memcached? Understand the basic architecture of a memcached cluster and the advantages and disadvantages of requiring client-side libraries to manage a cache cluster.

A key-value DB specific for caching. (Redis is more general) Dis: inconsistency

Client-size library? 把图片等static

content存在客户端,下一次访问不需要request server。

57. How can increasing the concurrency of requests increase database performance? What problems can increased concurrency cause?

How? concurrently process requests. take full advantage of database ability.

Problem? race conditions. it also has overhead for db to create thread.

58. Why should one tune the performance of their web application? Why shouldn’t one tune the performance of their web application?

Why shouldn’t?

early-tuning: 开发早期还不知道bottleneck在哪呢,就乱tune,会使代码易读性降低,逻辑复杂,不利于开发和维护;调整的可能根本就不是bottleneck。

how? 发现问题后,再根据问题进行分析和tune。

59. What is a performance profiler?

60. How do you determine which parts of your web application that you should tune? Why is it important to test and tune your web application at high system loads? What are the common resource bottlenecks? Of these, which resource is usually the bottleneck for web applications? What is the 80/20 (or 90/10) rule?

How? use profiler to identify

80/20 rule? 10% of the code consume 90% of the runnig time.

61. How might you try to increase the performance of an I/O-bound web application?

caching distributed system - use more nodes static edge server reduce size of data you send. (e.g. grumblr, ajax only sends the post which changed)

62. What factors influence how long it takes to read or write a data element in a database?

what kinds of db? (sqlite? -no concurrency)

concurrency level?

server speed / how you write the code?

network conditions?

unnecessary connection?

replica? sharding (horizontal partitioning)? vertical partitioning?

db schema? (indexing? secondary indexing?)

63. How can increasing the concurrency of a web application eliminate I/O as a performance bottleneck? What are the advantages and disadvantages of batching I/O operations together for scalability and performance?

64. Describe several approaches to reduce browser-to-server network I/O.

65. Aside from increasing the number of concurrently-executing threads, how can one increase the effective concurrency of a web application?

  1. DB - always the bottleneck! No sqlite. Use MySQL/etc. which supports concurrency
  2. Multi-node cluster
  3. Asyn request + callback method -> so users don’t wait

66. What is accessibility? Understand several basic strategies for making a device-independent web page accessible.

Can be accessible regardless network/device/software conditions, disability people, etc.

67. What is a think-aloud study? Understand the usefulness of think-aloud studies for obtaining detailed feedback on the usability of an application, and know how to run a basic think-aloud study.

68. Compare advantages and disadvantages of common modern web application server deployment methods, including local deployment, Heroku, and Google App Engine and/or AWS.

Only knowing basic differences is enough.

Local deployment - simple. but not strong enough for: large data set, heavy queries, fault tolerant, …

Heroku - PaaS.

Google App Engine - PaaS.

AWS - IaaS.

69. What is a version control system? You should know the basic features of Git and how you can use Git to manage a concurrent development process among a large team of developers.

track changes, roll back to past state; multi-programmer dev-> branching.

70. What are WebSockets? Describe the advantages of WebSockets over older technologies for maintaining interactive web-based sessions such as long polling.