CSA Final Note

type

status

date

slug

summary

This is a study and review note for the course GENG0030 Computer Systems and Applications at the University of Southampton. It focuses solely on summarizing the scope of the final exam for the 2024-2025 academic year and does not cover the entire course content.

Computer system and peripheral Hardware

计算机系统与外围硬件概述：计算机硬件组件：CPU、内存、存储设备、输入/输出设备。计算机软件组件：操作系统、应用软件、系统软件。网络通信基础：局域网、广域网、TCP/IP协议。计算机应用领域：商务、教育、娱乐、科研等。 Overview of computer systems and peripheral hardware: Computer hardware components: CPU, memory, storage devices, input/output devices. Computer software components: operating systems, application software, system software. Network communication fundamentals: local area networks, wide area networks, TCP/IP protocols. Computer applications: business, education, entertainment, research, etc.

计算机的发展历程：第一代计算机：真空管技术，ENIAC等。第二代计算机：晶体管技术，批处理操作系统。第三代计算机：集成电路，高级编程语言。第四代计算机：微处理器技术，个人计算机和计算机网络。第五代计算机：人工智能技术，语音识别和自然语言处理。 History of computers: First generation computers: vacuum tube technology, ENIAC, etc. Second generation computers: transistor technology, batch operating systems. Third generation computers: integrated circuits, high-level programming languages. Fourth generation computers: microprocessor technology, personal computers and computer networks. Fifth generation computers: artificial intelligence technology, speech recognition and natural language processing.

计算机系统分类：微型计算机：个人计算机、笔记本电脑、工作站、网络服务器。迷你计算机：科学计算、商业交易处理、文件处理、数据库管理。大型计算机：大型组织的关键应用，如数据批处理。超级计算机：气候研究、天气预报、基因组测序等。 Classification of computer systems: Minicomputers: personal computers, laptops, workstations, network servers. Minicomputers: scientific computing, business transaction processing, file processing, database management. Mainframe computers: critical applications for large organizations, such as batch data processing. Supercomputers: climate research, weather forecasting, genome sequencing, etc.

计算机硬件功能：输入设备：键盘、鼠标、扫描仪等。处理设备：CPU的算术逻辑单元和控制单元。输出设备：显示器、打印机等。存储设备：内存、硬盘、光盘等。

Computer hardware functions: Input devices: keyboard, mouse, scanner, etc. Processing devices: CPU's arithmetic logic unit and control unit. Output devices: monitors, printers, etc. Storage devices: memory, hard disk, CD-ROM, etc.

计算机处理速度：时间单位：毫秒、微秒、纳秒、皮秒。性能指标：MIPS、Teraflops。时钟速度：MHz、GHz。

Computer processing speed: Time units: milliseconds, microseconds, nanoseconds, picoseconds. Performance indicators: MIPS, Teraflops. Clock speed: MHz, GHz.

外围设备：输入技术：键盘、鼠标、触摸屏、语音识别等。输出技术：CRT和LCD显示器、喷墨和激光打印机等。辅助存储：硬盘、光盘、U盘等。

Peripherals: Input technologies: keyboard, mouse, touch screen, voice recognition. Output technologies: CRT and LCD monitors, inkjet and laser printers, etc. Auxiliary storage: hard disks, CD-ROMs, USB drives, etc.

计算机存储基础：二进制表示法：位、字节。存储容量单位：字节、KB、MB、GB、TB、PB。存储访问方式：直接访问、顺序访问。

Fundamentals of computer storage: Binary representation: bit, byte. Storage capacity units: byte, KB, MB, GB, TB, PB. Storage access methods: direct access, sequential access. 半导体存储器： RAM：随机访问存储器，易失性存储。 ROM：只读存储器，非易失性存储。半导体存储器优势：速度快、抗冲击、温度变化。半导体存储器劣势：易失性，需要持续供电。

Semiconductor memory: RAM: random access memory, volatile storage. ROM: read-only memory, non-volatile storage. Semiconductor memory advantages: speed, shock resistance, temperature change. Semiconductor memory disadvantages: volatile, requires continuous power supply.

System software

硬件与软件

硬件：计算机的物理组成部分，如显示器、键盘、鼠标等。

软件：运行在计算机上的程序，分为系统软件和应用软件。

系统软件：负责运行硬件和管理计算机系统，例如操作系统、设备驱动程序、实用程序软件等。

应用软件：帮助用户完成特定任务的软件，如文字处理软件、网页浏览器等。

Hardware: the physical components of a computer, such as a monitor, keyboard, mouse, etc.

Software: Programs that run on the computer, divided into system software and application software.

System software: responsible for running the hardware and managing the computer system, such as operating systems, device drivers, utility software, and so on.

Application software: software that helps users accomplish specific tasks, such as word processing software, web browsers, etc.

操作系统

定义：管理计算机系统的硬件和软件资源，作为用户与计算机硬件之间的接口。

常见操作系统：Windows、Mac OS X、Linux、iOS等。

操作系统层次结构：由用户界面、应用程序、内核、硬件资源（如CPU、内存、设备）组成，内核是控制中心，根据请求的优先级分配资源。

Definition: Manages the hardware and software resources of a computer system and serves as the interface between the user and the computer hardware.

Common operating systems: Windows, Mac OS X, Linux, iOS, etc.

Operating system hierarchy: consists of user interface, applications, kernel, and hardware resources (e.g., CPU, memory, devices). The kernel is the control center and allocates resources according to the priority of requests.

操作系统的功能

人机交互界面：包括图形用户界面（GUI）和命令行界面（CLI）。

Human-computer interface: includes graphical user interface (GUI) and command line interface (CLI).

多任务处理：操作系统可以同时运行多个程序，但CPU在某一时刻只能处理一个程序，其他程序处于等待状态。

Multitasking: the operating system can run multiple programs simultaneously, but the CPU can only process one program at a given time, and the other programs are in a waiting state.

程序调度：操作系统决定程序的执行顺序，常见的调度算法有：

Program Scheduling: the operating system determines the execution order of programs, common scheduling algorithms are:

先来先服务（FCFS）：按到达顺序处理作业。

First-Come-First-Served (FCFS): Jobs are processed in the order of arrival.

最短作业优先（SJF）：先处理预计完成时间最短的作业。

Shortest Job First (SJF): Process the job with the shortest expected completion time first.

轮转（RR）：每个作业分配一个时间片，按先到先服务原则处理，但每个作业的处理时间受限于时间片。

Round Robin (RR): Each job is allocated a time slice and processed on a first-come-first-served basis, but the processing time of each job is limited by the time slice.

优先级调度：根据作业的优先级进行调度，优先级高的先处理。

Priority scheduling: Jobs are scheduled according to their priority, with the higher priority being processed first.

内存管理：操作系统确保多个进程在内存中高效共享，通过分段（Segmentation）和分页（Paging）分配内存，当RAM不足时，使用虚拟内存（Virtual Memory）。

Memory Management: The operating system ensures that multiple processes share memory efficiently by allocating memory through segmentation and paging, and using Virtual Memory when RAM is insufficient.

输入输出设备控制：通过协议和设备驱动程序管理外围设备与计算机之间的数据传输。

Input/Output Device Control: Manages the transfer of data between peripheral devices and the computer through protocols and device drivers.

文件管理系统：操作系统需要知道文件的位置、组织结构、数据量以及与文件系统通信的协议。

File Management System: The operating system needs to know the location of files, their organization, the amount of data, and the protocols used to communicate with the file system.

中断处理：中断是设备或软件向处理器发送的信号，处理器会暂时停止当前任务，处理中断信号。

Interrupt Handling: An interrupt is a signal sent by a device or software to the processor, which temporarily stops the current task and handles the interrupt signal.

缓冲区（Buffer）：用于暂时存储数据，以协调处理器与硬件设备之间的速度差异。

Buffer: Used to temporarily store data to harmonize the speed difference between the processor and the hardware device.

操作系统的类型

分布式操作系统：通过多个互联的服务器共享负载，实现并行处理。

多任务操作系统：用于笔记本电脑和个人电脑，多个进程同时运行，处理器在进程之间切换。

多用户多任务操作系统：基于时间共享，为每个用户分配时间片，多个终端连接到一个主系统。

嵌入式操作系统：设计用于在大型机械或电气系统中执行专用功能，提供有限的控制功能，没有永久存储。

实时操作系统：用于关键系统，如核反应堆温度控制和空中交通管制，具有容错和冗余特性。

移动和手持设备操作系统：如Windows Phone、Apple iOS和Android，提供图形用户界面，功能包括听音乐、看电影、阅读电子书、玩游戏、上网和查看电子邮件。

Operating System Type	Description	Examples
Distributed Operating System	Parallel processing through multiple interconnected servers sharing the load.	Multiple interconnected servers
Multitasking Operating System	Used in laptops and personal computers where multiple processes run simultaneously.	Laptops, personal computers
Multi-User Multi-Tasking Operating System	Based on time sharing, with time slices assigned to each user, multiple terminals connected to a single master system.	Multiple terminals connected to a master system
Embedded OS	Designed to perform specialized functions in large mechanical or electrical systems, with no permanent storage.	Embedded systems in mechanical/electrical systems
Real-Time Operating Systems	Used in critical systems such as nuclear reactors or air traffic control, with fault tolerance and redundancy.	Nuclear reactor control, air traffic control systems
Mobile and Handheld Device OS	Used in mobile devices like phones and tablets, providing a graphical interface with multimedia features.	Windows Phone, iOS, Android

Interface: System software acts as an interface between the computer hardware and the user, translating user instructions into machine-understandable language (binary code).

Types of System Software:

Operating System (OS): Manages all hardware components (CPU, keyboard, mouse, etc.), ensuring they function correctly. It's the first software loaded when the computer starts.
Language Processors: Convert human-readable languages (high-level languages) into machine-level language.

Assembler: Converts assembly language to machine language.
Interpreter: Converts high-level language to machine language line by line.
Compiler: Converts high-level language to machine language all at once.

Device Drivers: Manage devices connected to the computer (printers, etc.), allowing the operating system to interact with them.

虚拟机

定义：在一台计算机上模拟另一台计算机的环境，例如在Windows操作系统上模拟Apple iOS环境。

组成：主机操作系统（Host OS）管理客户操作系统（Guest OS），虚拟硬件由仿真引擎（Hypervisor）负责映射到主机计算机的物理硬件。

优点：可以在计算机上使用额外的操作系统，方便运行旧程序或测试新操作系统，而不会导致主机操作系统崩溃。

缺点：软件性能不如原系统，对于大型公司来说安装和维护成本较高。

Definition: An environment that emulates another computer on one computer, such as the Apple iOS environment on a Windows operating system.

Components: The host operating system (Host OS) manages the guest operating system (Guest OS), and the virtual hardware is mapped to the physical hardware of the host computer by an emulation engine (Hypervisor) that is responsible for mapping the virtual hardware to the physical hardware of the host computer.

Advantages: It is possible to use an additional operating system on the computer, facilitating the running of old programs or the testing of new operating systems without causing the host OS to crash.

Disadvantages: The software performance is not as good as that of the original system.

Translators

编程语言的分类

高级语言（High-Level Languages）

特点：程序员无需了解计算机硬件和指令集，程序具有可移植性，可以在不同系统上运行。例如：Java、C++和Python。

优势：易于编写和理解，适用于多种编程任务。

低级语言（Low-Level Languages）

特点：与机器代码和硬件密切相关，提供对计算机物理架构的最小抽象，允许直接操作内存和硬件组件，效率高但编程复杂。

分类：

汇编语言（Assembly Language）：使用助记符（如MOV、ADD、SUB等），比机器语言更易读，但仍然特定于处理器架构，代码占用内存小，执行速度快。

机器语言（Machine Language）：由二进制代码（1和0）组成，计算机CPU可以直接执行，特定于特定处理器架构，需要对硬件有深入了解。

Characteristics: Programmers do not need to know about computer hardware and instruction sets, programs are portable and can run on different systems. Examples include Java, C++ and Python.

Strengths: Easy to write and understand, suitable for a wide range of programming tasks.

Low-Level Languages (LLL)

Characteristics: Closely related to machine code and hardware, provide minimal abstraction of the physical architecture of the computer, allow direct manipulation of memory and hardware components, efficient but complex to program.

Classification:

Assembly Language (Assembly Language): uses mnemonics (e.g., MOV, ADD, SUB, etc.), more readable than machine language but still specific to processor architecture, code takes up little memory and executes quickly.

Machine Language: consists of binary code (1's and 0's) that can be directly executed by the CPU of a computer, is specific to a particular processor architecture, and requires an in-depth knowledge of hardware.

翻译器（Translators）

定义：将程序员编写的程序和汇编语言代码翻译成计算机可以理解的二进制形式的实用程序。

Definition: A utility program that translates program and assembly language code written by a programmer into a binary form that can be understood by a computer.

分类：

方面	编译器（Compiler）	解释器（Interpreter）	汇编器（Assembler）
功能	将整个高级语言程序一次性翻译成机器代码。	将程序逐行翻译并执行。	将汇编语言代码翻译成机器代码。
翻译过程	先编译整个代码，再执行。	逐行翻译并执行程序。	将汇编指令直接转换成机器代码。
执行	编译完成后再执行程序。	程序在翻译的同时立即执行，每次执行都需要翻译。	汇编器生成机器代码后，程序直接执行。
速度	编译速度较慢，但执行速度较快。	执行较慢，因为逐行翻译。	翻译速度较快，执行效率较高。
错误检测	在编译过程中检测所有错误，编译失败后不执行程序。	程序执行时逐行检测错误，发现错误时立即停止执行。	在汇编过程中检测汇编语法和指令使用错误。
示例语言	C、C++、Java（通过字节码）	Python、 Ruby、 JavaScript	x86汇编、ARM汇编

Aspect	Compiler	Interpreter	Assembler
Function	Translates the entire high-level program into machine code at once.	Translates and executes the program line by line.	Translates assembly language code into machine code.
Translation Process	The entire code is compiled first, then executed.	Code is translated and executed one line at a time during runtime.	Converts assembly instructions directly into machine code.
Execution	Program execution begins after the entire code is compiled.	Program executes immediately, but requires translation each time.	Generates machine code from assembly for execution.
Speed	Compilation is slow, but execution is fast.	Slower execution due to line-by-line translation.	Translation is fast, and execution is efficient.
Error Detection	Detects all errors before execution.	Detects errors during execution (line by line).	Detects errors in assembly language during translation.
Example Languages	C, C++, Java (with bytecode).	Python, Ruby, JavaScript.	x86 Assembly, ARM Assembly.

编译器（Compiler）

功能：将高级语言代码翻译成机器代码。

Function: Translates high-level language code into machine code.

一次性翻译整个代码，生成可执行文件。

Translates the entire code at once to produce an executable file.

代码优化，但错误只有在完整编译后才会被发现。

Code is optimized, but errors are only found after a full compilation.

编译后的代码难以反向工程为源代码，修改困难。

Compiled code is difficult to reverse-engineer into source code and difficult to modify.

不同平台需要不同的编译器（例如Windows OS+Intel处理器，Apple OS+PowerPC）。

Different compilers are required for different platforms (e.g. Windows OS + Intel processor, Apple OS + PowerPC).

解释器（Interpreter）

功能：逐行读取高级语言代码，将其转换为中间代码（通常是汇编语言），然后执行，速度较慢。发现错误时会提示用户修正，不会对代码进行优化。适用于需要频繁修改代码的场景。

Read high-level language code line by line, convert it to intermediate code (usually assembly language), and then execute it, which is slower. When an error is found, the user will be prompted to fix it, and the code will not be optimized. It is suitable for scenarios that require frequent code modification.

汇编器（Assembler）

功能：将汇编语言代码翻译成机器代码。

Function: Translate assembly language code into machine code.

一次性翻译整个代码，生成机器代码。

Translates the entire code at once to generate machine code.

不同架构的处理器有不同的汇编语言指令。

Processors of different architectures have different assembly language instructions.

输入到汇编器的代码称为源代码。

The code entered into the assembler is called source code.

Feature	Compiler	Interpreter	Assembler
Execution Method	Translates entire code at once	Translates code line by line	Translates assembly code into machine code
Output	Generates an executable file	Does not generate an executable	Generates machine code directly
Speed of Execution	Faster once compiled	Slower due to line-by-line execution	Depends on hardware
Error Reporting	All errors reported after compilation	Stops at each error and reports	Reports errors after conversion

编译阶段（Stages of Compilation）

词法分析（Lexical Analysis）

功能：移除程序中的注释和多余空格，检查变量名是否合法，将关键字、常量和变量替换为唯一的符号（称为标记），并将标识符替换为指向变量地址的指针。

符号表（Symbol Table）：存储源代码中使用的关键字、变量和常量的详细信息，包括类型、地址等。

哈希表（Hash Table）：用于存储符号表，通过哈希函数快速访问元素。

语法分析（Syntax Analysis）

功能：检查标记序列是否符合语言的语法规则，使用解析技术（如栈）检查括号是否正确配对，分析算术运算符的优先级。

常见错误：括号不匹配、缺少分号、缩进错误、语句结构错误等。

语义分析（Semantic Analysis）

功能：检查代码的语义是否正确，例如变量是否已声明、类型是否匹配、循环计数器是否使用正确类型等。

常见错误：使用未声明的变量、类型不匹配错误等。

代码生成与优化（Code Generation & Optimisation）

功能：生成机器代码，优化代码以减少执行时间和资源使用，例如移除冗余指令、调整程序运行方式。

优化示例：将循环内的变量赋值移到循环外，避免重复执行相同语句。

Lexical Analysis

Function: removes comments and extra spaces from the program, checks variable names for legality, replaces keywords, constants and variables with unique symbols (called tokens), and replaces identifiers with pointers to variable addresses.

Symbol Table: Stores detailed information about keywords, variables, and constants used in the source code, including type, address, and so on.

Hash Table: Used to store the symbol table, with quick access to the elements through the hash function.

Syntax Analysis

Functions: check whether a sequence of tokens conforms to the syntax rules of the language, use parsing techniques (e.g., stacks) to check whether parentheses are correctly paired, and analyze the precedence of arithmetic operators.

Common errors: mismatched parentheses, missing semicolons, indentation errors, statement structure errors and so on.

Semantic Analysis

Function: Check whether the semantics of the code is correct, such as whether the variables have been declared, whether the type matches, whether the loop counter uses the correct type, and so on.

Common errors: use of undeclared variables, type mismatch errors, etc.

Code Generation & Optimization (Code Generation & Optimization)

Function: Generate machine code and optimize the code to reduce execution time and resource usage, e.g. removing redundant instructions, adjusting the way the program runs.

Optimization example: Move variable assignments inside a loop to outside the loop to avoid repeating the same statement.

Analysis Phase	Function	Common Errors	Description
Lexical Analysis	Removes comments and extra spaces, checks variable names for legality, replaces keywords, constants, and variables with unique symbols (tokens), and replaces identifiers with pointers to variable addresses.	-	Symbol table stores detailed information about keywords, variables, and constants (type, address, etc.). Hash table is used for quick access.
Syntax Analysis	Checks whether the sequence of tokens conforms to the language's syntax rules, verifies parentheses pairing, and analyzes operator precedence using parsing techniques (e.g., stacks).	Mismatched parentheses, missing semicolons, indentation errors, incorrect statement structures.	Verifies structural correctness, ensuring the program follows syntactical rules.
Semantic Analysis	Verifies whether the semantics of the code are correct, such as checking if variables are declared, if types match, or if the loop counter has the correct type.	Use of undeclared variables, type mismatch errors.	Ensures logical correctness, focusing on data flow, declarations, and types in the program.
Code Generation & Optimization	Generates machine code and optimizes it to reduce execution time and resource usage (e.g., removing redundant instructions, adjusting the program's structure).	-	Includes improving performance by reordering or moving instructions, like moving variable assignments outside loops.

分析阶段	功能	常见错误	描述
词法分析	移除注释和多余空格，检查变量名是否合法，将关键字、常量和变量替换为唯一符号（称为标记），并将标识符替换为指向变量地址的指针。	-	符号表存储有关关键字、变量和常量的详细信息（如类型、地址等）。哈希表用于快速访问元素。
语法分析	检查标记序列是否符合语言的语法规则，使用解析技术（如栈）检查括号是否配对，分析运算符优先级。	括号不匹配、缺少分号、缩进错误、语句结构错误等。	验证程序是否符合语法规则，确保结构正确性。
语义分析	检查代码的语义是否正确，如检查变量是否声明、类型是否匹配、循环计数器是否使用了正确的类型等。	使用未声明的变量、类型不匹配错误等。	确保程序的逻辑正确性，重点检查数据流、声明和类型等方面。
代码生成与优化	生成机器代码并对其进行优化，以减少执行时间和资源使用（例如，删除冗余指令、调整程序运行方式）。	-	包括通过重新排序或移动指令（如将变量赋值移出循环）来提高性能，避免重复执行相同的语句。

字节码（Bytecode）

定义：一种中间表示形式，结合了编译和解释的特点，使程序具有可移植性和平台独立性。

应用：Java、Python和MATLAB等编程语言使用字节码，Java代码编译成字节码（.class文件），由Java虚拟机（JVM）在执行时解释为本地代码；Python代码编译成字节码（.pyc文件），由Python解释器执行。

Java虚拟机（JVM）的优势：

安全性：未知来源的字节码首先在JVM中运行，确认无恶意后才运行主程序。

平台独立性：可以在任何安装了适当虚拟机的系统上运行。

Definition: An intermediate representation that combines the features of compilation and interpretation to make programs portable and platform independent.

Applications: Programming languages such as Java, Python and MATLAB use bytecode. Java code is compiled into bytecode (.class file), which is interpreted as native code by the Java Virtual Machine (JVM) during execution; Python code is compiled into bytecode (.pyc file), which is executed by the Python interpreter.

Advantages of Java Virtual Machine (JVM):

Security: bytecode from unknown sources is first run in the JVM to confirm that it is not malicious before running the main program.

Platform independence: can be run on any system with an appropriate virtual machine installed.

动态链接库（Dynamic Link Libraries, DLL）

定义：一个共享的子程序库，程序员可以在需要时在程序中使用这些子程序。

Definition: A shared library of subroutines that programmers can use in their programs when needed.

优势：某些例程被程序员多次使用，因此开发并测试了这样的例程库，当调用例程时，从库中获取代码并执行。

Advantage: Certain routines are used many times by programmers, so such a library of routines is developed and tested, and when the routine is called, the code is fetched from the library and executed.

链接器（Linkers）和加载器（Loaders）

链接器（Linkers）

功能：将编译后的子程序链接到机器代码中，为调用和返回语句提供机器地址，确保模块链接在一起。

Function: links compiled subroutines into machine code, provides machine addresses for call and return statements, and ensures that modules are linked together.

加载器（Loaders）

功能：将目标代码加载到内存中的任何位置，但需要满足某些条件，例如程序不能包含绝对地址，必须是可重定位格式。

Function: Loads the object code into any location in memory, subject to certain conditions, e.g. the program must not contain absolute addresses and must be in a relocatable format.

Computer Architecture

计算机架构概述

定义：计算机架构定义了计算机系统的设计和功能，重点关注硬件和软件如何交互。它包括指令集、内存结构和数据流，决定了CPU如何处理任务。

作用：架构设计决定了计算机的运行效率、速度和可靠性。良好的架构设计可以确保计算机高效、快速且可靠地运行。

DEFINITION: Computer architecture defines the design and function of a computer system, focusing on how hardware and software interact. It includes the instruction set, memory structure, and data flow, and determines how the CPU handles tasks.

Role: Architecture design determines how efficiently, quickly, and reliably a computer operates. Good architectural design ensures that computers run efficiently, quickly and reliably.

计算机的主要组成部分

中央处理单元（CPU）：Central Processing Unit (CPU)

功能：负责处理计算机接收到的指令。

Function: responsible for processing instructions received by the computer.

组成 Components:

算术逻辑单元（ALU）：执行算术运算（如加法、减法、乘法）和逻辑运算（如AND、OR、NOT）。

Arithmetic-Logic Unit (ALU): performs arithmetic operations (e.g. addition, subtraction, multiplication) and logical operations (e.g. AND, OR, NOT).

控制单元（CU）：控制内存、处理器和输入输出设备，包含当前指令寄存器（CIR）和程序计数器（PC）。

Control Unit (CU): control memory, processor and input/output devices, including the current instruction register (CIR) and program counter (PC).

寄存器（Registers）：CPU内部的小型、快速存储位置，临时存储数据、指令和地址，提高处理速度。

Registers (Registers): small, fast storage locations inside the CPU that temporarily store data, instructions and addresses to increase processing speed.

常见寄存器 Common Registers:

累加器（ACC）：存储处理器执行的算术和逻辑运算的结果。

程序计数器（PC）：存储下一个要执行的指令的地址。

内存地址寄存器（MAR）：存储要读取或写入数据的内存地址。

内存数据寄存器（MDR）：临时存储从内存读取或写入的数据。

状态寄存器（SR）：根据指令结果存储设置或清除的位（例如，加法中的溢出或进位）。

Accumulator (ACC): stores the results of arithmetic and logical operations performed by the processor.

Program Counter (PC): stores the address of the next instruction to be executed.

Memory Address Register (MAR): stores the memory address where data is to be read or written.

Memory Data Register (MDR): Temporarily stores data to be read from or written to memory.

Status Register (SR): Stores bits that are set or cleared depending on the result of the instruction (e.g., overflow or rounding in addition).

内存单元（Memory Unit）

功能：存储数据和指令，每个存储位置都有一个唯一的地址。

读写操作：

读操作：通过内存地址寄存器（MAR）指定地址，数据被读取到内存数据寄存器（MDR）。

写操作：数据先写入MDR，地址写入MAR，然后通过控制信号将数据写入指定地址。

Function: Stores data and instructions with a unique address for each memory location.

Read and write operations:

Read operation: Data is read into the Memory Data Register (MDR) by specifying the address through the Memory Address Register (MAR).

Write operation: Data is first written to the MDR, the address is written to the MAR, and then the data is written to the specified address by a control signal.

输入/输出设备（I/O Devices）

功能：管理计算机与外部设备之间的通信。

输入设备：如键盘、鼠标、扫描仪，将数据发送到CPU。

输出设备：如显示器、打印机、扬声器，显示或输出CPU处理的结果。

I/O控制器：协调CPU与这些设备之间的数据传输，确保交互顺畅。

Input/Output Devices (I/O Devices)

Function: Manages communication between the computer and external devices.

Input Devices: such as keyboards, mice, and scanners, send data to the CPU.

Output Devices: such as monitors, printers, and speakers that display or output the results of CPU processing.

I/O Controller: coordinates data transfer between the CPU and these devices to ensure smooth interaction.

计算机架构类型

冯·诺伊曼架构（von Neumann Architecture）

特点：数据和指令（程序代码）存储在同一个共享内存中，通过总线（buses）连接处理器、内存和输入输出设备。

优势：设计简单、成本低、易于实现。

劣势：处理速度较慢，因为数据和指令共享同一内存。

哈佛架构（Harvard Architecture）

特点：使用独立的内存空间存储指令（程序代码）和数据，允许CPU同时访问指令和数据。

优势：处理速度更快，因为可以并行处理指令和数据。

劣势：设计复杂、成本高，内存空间固定。

架构对比

Feature	von Neumann	Harvard
Execution	Sequential processing	Parallel processing
Speed	Slow	Improved speed
Cost	Inexpensive	Costly
Implementation	Easy to implement	Complex implementation
Memory	Flexible memory	Fixed memory spaces
Memory Type	Single memory	Separate memory

总结与复习重点

CPU：包含ALU、CU和寄存器，负责处理指令。

CPU: contains ALUs, CUs, and registers that process instructions.

ALU：执行算术和逻辑运算。

ALU: performs arithmetic and logical operations.

CU：控制内存、处理器和I/O设备。

CU: controls memory, processor, and I/O devices.

寄存器：如ACC、PC、MAR、MDR、SR，用于临时存储数据和地址。

Registers: such as ACC, PC, MAR, MDR, SR, used for temporary storage of data and addresses.

内存单元：存储数据和指令，通过MAR和MDR进行读写操作。

Memory cells: store data and instructions, read and write operations via MAR and MDR.

I/O设备：管理计算机与外部设备的通信。

I/O devices: manage the communication between the computer and external devices.

冯·诺伊曼架构：数据和指令共享同一内存，设计简单但速度较慢。

Von Neumann architecture: data and instructions share the same memory, simple design but slow.

哈佛架构：数据和指令存储在独立内存中，速度更快但设计复杂。

Harvard architecture: data and instructions are stored in separate memory, faster but complex design.

The CPU and Execution Cycle

1.CPU结构回顾

CPU的主要组成部分 The main components of the CPU:

算术逻辑单元（ALU）：执行算术运算（如加法、减法）和逻辑运算（如AND、OR、NOT）。

Arithmetic-Logic Unit (ALU): performs arithmetic operations (e.g., addition, subtraction) and logical operations (e.g., AND, OR, NOT).

控制单元（CU）：控制内存、处理器和输入输出设备，包含当前指令寄存器（CIR）和程序计数器（PC）。

Control Unit (CU): control memory, processor and input/output devices, including the current instruction register (CIR) and program counter (PC).

寄存器（Registers）：临时存储数据、指令和地址，提高处理速度。

Registers (Registers): Temporary storage of data, instructions and addresses to improve processing speed.

常见寄存器 Common registers:

程序计数器（PC）：存储下一个要执行的指令的地址。

Program Counter (PC): stores the address of the next instruction to be executed.

内存地址寄存器（MAR）：存储要读取或写入数据的内存地址。

Memory Address Register (MAR): Stores the memory address where data is to be read or written.

内存数据寄存器（MDR）：临时存储从内存读取或写入的数据。

Memory Data Register (MDR): Temporarily stores data to be read from or written to memory.

当前指令寄存器（CIR）：存储当前正在执行的指令。

Current Instruction Register (CIR): stores the instruction currently being executed.

累加器（ACC）：存储处理器执行的算术和逻辑运算的结果。

Accumulator (ACC): stores the results of arithmetic and logical operations performed by the processor.

2.指令执行过程：取-译-执（Fetch-Decode-Execute,FDE）周期

取指令（Fetch）：

步骤：

PC中的地址复制到MAR：程序计数器（PC）包含下一个要执行的指令的地址，该地址被复制到内存地址寄存器（MAR）。

Address in PC is copied to MAR: The program counter (PC) contains the address of the next instruction to be executed, which is copied to the memory address register (MAR).

从内存读取指令到MDR：内存中的指令被读取到内存数据寄存器（MDR）。

Instruction read from memory to MDR: The instruction in memory is read into the Memory Data Register (MDR).

MDR中的指令复制到CIR：MDR中的指令被复制到当前指令寄存器（CIR）。

Instruction in MDR copied to CIR: The instruction in MDR is copied to the Current Instruction Register (CIR).

PC加1：程序计数器（PC）的值加1，指向下一个指令。

PC plus 1: The value of the program counter (PC) is increased by 1 to point to the next instruction.

作用：从内存中获取指令并加载到处理器的指令寄存器中，为后续的解码和执行做准备。

Function: Instructions are fetched from memory and loaded into the processor's instruction register in preparation for subsequent decoding and execution.

解码指令（Decode）：

步骤：解码指令以确定它代表的操作以及所需的操作数。

Step: Decode an instruction to determine the operation it represents and the number of operations required.

作用：帮助处理器理解指令的含义以及需要处理的数据。

Function: Helps the processor understand the meaning of the instruction and the data to be processed.

执行指令（Execute）：

步骤：根据解码后的指令，处理器发送适当的控制信号到内存单元和输入输出设备，执行实际的操作。

Step: According to the decoded instruction, the processor sends appropriate control signals to the memory unit and input/output devices to perform the actual operation.

作用：执行指令指定的实际操作，如计算或数据操作。

Function: Execute the actual operation specified by the instruction, such as calculation or data operation.

存储结果（Store）：

步骤：将操作的结果存储在内存或寄存器中。

Step : Store the result of the operation in memory or registers.

作用：保存计算或操作的结果，以便后续使用。

Function : To save the result of a calculation or operation for subsequent use.

3.中断（Interrupts）

定义：中断是设备或软件向处理器发送的信号，要求处理器暂停当前任务并处理中断信号。

Definition: An interrupt is a signal sent by a device or software to the processor that requires the processor to pause the current task and handle the interrupt signal.

作用：允许计算机同时执行多个任务，提高系统的响应能力和效率。

Role: Allows the computer to perform multiple tasks at the same time, improving the responsiveness and efficiency of the system.

处理过程 Handling Process:

保存当前任务状态：在处理中断之前，保存当前任务的状态，包括程序计数器（PC）和当前指令寄存器（CIR）的内容。

Save the current task state: before processing the interrupt, save the state of the current task, including the contents of the program counter (PC) and the current instruction register (CIR).

执行中断服务例程（ISR）：根据中断信号执行相应的中断服务例程。

Execute Interrupt Service Routine (ISR): Execute the corresponding interrupt service routine according to the interrupt signal.

恢复当前任务：中断处理完成后，根据保存的状态恢复当前任务的执行。

Resume the current task: after the interrupt processing is completed, the execution of the current task is resumed according to the saved state.

中断优先级：在某些情况下，可能会有多个中断同时发生，处理器可以根据中断的优先级决定是否暂停当前中断服务，以处理更高优先级的中断。

Interrupt Priority: In some cases, multiple interrupts may occur at the same time, and the processor can decide whether to suspend the current interrupt service based on the priority of the interrupt to handle higher priority interrupts.

System Bus

系统总线（System Bus）

1.总线的定义

总线（Bus）：一种通信路径，用于连接两个或多个设备，通常由多条通道组成，例如32位数据总线包含32条单独的单比特通道。

Bus: A communication path used to connect two or more devices, usually consisting of multiple channels, e.g. a 32-bit data bus contains 32 individual single-bit channels.

总线的外观 Bus Appearance:

电路板上的平行线：如主板上的总线。

Parallel wires on a circuit board: e.g., a bus on a motherboard.

带状电缆：如Ribbon cables。

条形连接器：如主板上的PCI（Peripheral Component Interconnect）总线。

Strip connectors: such as the PCI (Peripheral Component Interconnect) bus on the motherboard.

外部电缆：如USB或Firewire。

External cables: such as USB or Firewire.

2.总线的特性

总线速度（Bus Speed）：以兆赫兹（MHz）为单位，表示总线的传输速度。

Bus Speed: The transmission speed of the bus is expressed in megahertz (MHz).

总线宽度（Bus Width）：表示总线一次可以传输的比特数，例如64位计算机的总线宽度为64位。

Bus Width: Indicates the number of bits the bus can transmit at one time, e.g. a 64-bit computer has a 64-bit bus width.

总线方向：总线可以是单向的或双向的，具体取决于其功能。

Bus Direction: The bus can be unidirectional or bidirectional, depending on its function.

3.总线的类型

地址总线（Address Bus）

功能：单向总线，用于从CPU向内存或I/O控制器发送内存地址，以便读取或写入数据。

Function: A unidirectional bus used to send memory addresses from the CPU to memory or I/O controllers for reading or writing data.

宽度：地址总线的宽度决定了可寻址内存位置的最大数量。例如，16位地址总线可以寻址64 KB（2^16=65,536个地址，每个地址对应1字节内存位置）。

Width: The width of the address bus determines the maximum number of memory locations that can be addressed. For example, a 16-bit address bus can address 64 KB (2^16 = 65,536 addresses, each of which corresponds to a 1-byte memory location).

示例：现代CPU可能有40位地址总线，可寻址1TB内存。

Example: A modern CPU may have a 40-bit address bus that can address 1 TB of memory.

数据总线（Data Bus）

功能：用于在CPU、内存和I/O设备之间传输数据和指令，包含8、16、32或64条平行线。

Function: Used to transfer data and instructions between the CPU, memory, and I/O devices and contains 8, 16, 32, or 64 parallel lines.

方向：双向总线，允许数据在系统内部组件之间双向传输。

Direction: A bi-directional bus that allows data to be transferred in both directions between components within the system.

宽度：数据总线宽度表示计算机系统一次可以传输的比特数。

Width: The width of the data bus indicates the number of bits a computer system can transfer at one time.

控制总线（Control Bus）

功能：双向总线，用于在处理器、内存和I/O设备之间传输时序、状态信号和其他命令。

Function: Bidirectional bus for transferring timing, status signals, and other commands between the processor, memory, and I/O devices.

控制信号 Control Signals:

时钟（Clock）：同步计算机上的操作。

Clock: synchronizes operations on the computer.

内存读（Memory Read）：将指定地址的内容复制到数据总线。

Memory Read: copies the contents of a specified address to the data bus.

内存写（Memory Write）：将数据总线的内容复制到指定地址。

Memory Write: copies the contents of the data bus to the specified address.

总线请求（Bus Request）：设备请求使用数据总线以执行读/写操作。

Bus Request: The device requests the use of the data bus to perform a read/write operation.

总线授权（Bus Grant）：处理器发出的信号，表示设备已被授权使用数据总线。

Bus Grant: A signal from the processor indicating that the device has been authorized to use the data bus.

中断请求（Interrupt Request）：表示发生了错误或异常，需要处理器（CPU）的注意。

Interrupt Request (Interrupt Request): Indicates that an error or exception has occurred and requires the attention of the processor (CPU).

总线类型	地址总线 (Address Bus)	数据总线 (Data Bus)	控制总线 (Control Bus)
功能	单向总线，用于从CPU向内存或I/O控制器发送内存地址，以便读取或写入数据。	用于在CPU、内存和I/O设备之间传输数据和指令，允许双向传输。	双向总线，用于在处理器、内存和I/O设备之间传输时序、状态信号和其他命令。
方向	单向（CPU → 内存/设备）	双向	双向
宽度	决定可寻址内存位置数量。例如16位宽度可寻址64 KB；现代CPU可能有40位宽度，可寻址1 TB内存。	表示系统一次可以传输的比特数，如8、16、32或64位。	宽度不固定，包含多条控制信号线，如时钟、读/写请求、中断等。
示例信号	地址信号（指明内存或I/O设备地址）	数据本身（指令、数值等）	时钟、内存读、内存写、总线请求、总线授权、中断请求等。
作用举例	指定要访问的内存单元地址	传输要读取或写入的数据	控制操作的执行时机和权限，例如同步、读写命令、中断请求等。

Bus Type	Address Bus	Data Bus	Control Bus
Function	A unidirectional bus used to send memory addresses from the CPU to memory or I/O controllers.	Used to transfer data and instructions between the CPU, memory, and I/O devices; allows bidirectional transfer.	A bidirectional bus for transferring timing, status signals, and other commands between processor, memory, and I/O devices.
Direction	Unidirectional (CPU → Memory/Devices)	Bidirectional	Bidirectional
Width	Determines the number of addressable memory locations. For example, a 16-bit width can address 64 KB; modern CPUs may have 40-bit width addressing up to 1 TB.	Indicates the number of bits the system can transfer at one time, such as 8, 16, 32, or 64 bits.	Variable width, includes multiple control signal lines such as clock, read/write requests, interrupts, etc.
Example Signals	Address signals (specify memory or I/O device address)	Data itself (instructions, values, etc.)	Clock, Memory Read, Memory Write, Bus Request, Bus Grant, Interrupt Request, etc.
Role Example	Specifies the memory cell address to access	Transfers data to be read or written	Controls timing and permission for operations, such as synchronization, read/write commands, interrupt requests.

4.冯·诺伊曼架构与哈佛架构的总线系统

冯·诺伊曼架构（von Neumann Architecture）

特点：使用单一总线（数据总线）来获取数据和指令，因为共享内存用于存储数据和指令。

Features: A single bus (data bus) is used to fetch data and instructions as shared memory is used to store both data and instructions.

问题：由于指令和数据不能同时获取，这会导致瓶颈。

Problem: This can lead to bottlenecks as instructions and data cannot be fetched at the same time.

哈佛架构（Harvard Architecture）

Harvard Architecture (Harvard Architecture)

特点：使用两条总线，允许同时获取指令和访问数据。

Characteristics: Uses two buses to allow simultaneous fetching of instructions and access to data.

优势：提高性能，但设计复杂。

Advantage: Improved performance but complex design.

Architecture	von Neumann Architecture	Harvard Architecture
Pros	Simple design	Faster performance due to simultaneous instruction and data fetching
ㅤ	Inexpensive and easy to implement	Efficient for real-time systems and high-speed applications
Cons	Bottleneck due to shared bus for data and instructions	Complex and costly design
ㅤ	Slower performance due to instruction and data fetching conflict	More difficult to implement and maintain

总结与复习重点

总线的定义：连接设备的通信路径，由多条通道组成。

Definition of a bus: a communication path that connects devices and consists of multiple channels.

总线的特性：速度（MHz）、宽度（位数）和方向（单向或双向）。

Characteristics of a bus: speed (MHz), width (number of bits) and direction (unidirectional or bidirectional).

总线的类型 Types of buses:

地址总线：单向，用于发送内存地址。

Address bus: unidirectional, used to send memory addresses.

数据总线：双向，用于传输数据和指令。

Data bus: Bidirectional, used to transmit data and instructions.

控制总线：双向，用于传输控制信号。

Control bus: Bidirectional, used to transmit control signals.

冯·诺伊曼架构与哈佛架构：

Von Neumann Architecture and Harvard Architecture:

冯·诺伊曼架构：使用单一总线，存在瓶颈。

Von Neumann architecture: Uses a single bus and has bottlenecks.

哈佛架构：使用两条总线，性能更高但设计复杂。

Harvard Architecture: Uses two buses, higher performance but complex design.

特性	冯·诺依曼架构	哈佛架构
存储结构	指令和数据共用同一存储器	指令和数据使用各自独立的存储器
总线结构	只有一组总线（地址/数据共用），取指与访存需抢占总线	指令总线和数据总线分离，可并行取指和访存
并行度	取指与访存互斥，吞吐率受限	取指与访存可同时进行，吞吐率更高
硬件复杂度	简单，成本较低	较复杂，成本和功耗均较高
灵活性	可动态修改程序与数据，适合通用处理	指令/数据存储区固定，灵活性较差，适合专用处理
缓存/优化	标准缓存设计需区分指令/数据访问，存在一致性问题	可为指令和数据分别设计优化缓存，无需一致性维护
典型应用场景	通用计算机、个人电脑、服务器	嵌入式系统、DSP、微控制器、实时系统
安全性	单一存储区易受恶意代码与数据冲突注入	指令存储空间只读，可更好保护代码完整性

Half and full adders

二进制加法器（Binary Adders）

1.二进制加法器的类型

半加器（Half Adder）

定义：半加器是一种组合逻辑电路，用于对两个单比特二进制数进行加法运算。

DEFINITION: A half adder is a combinational logic circuit used to perform addition of two single bit binary numbers.

输入和输出：

输入：两个单比特二进制数A和B。

Input: two single bit binary numbers A and B.

输出 Output：

和（Sum,S）：两个输入位的和。

进位（Carry,C）：如果发生溢出，则将进位传递到更高位。

Sum (Sum,S) : Sum of two input bits.

Carry (Carry,C): if overflow occurs, the carry bit is passed to the higher bit.

真值表：

A	B	S	C
0	0	0	0
0	1	1	0
1	0	1	0
1	1	0	1

布尔表达式：

S=A B（XOR运算）

C=A B（AND运算）

全加器（Full Adder）

定义：全加器是一种组合逻辑电路，用于对三个比特进行加法运算，包括两个输入比特和来自前一次加法的进位。

DEFINITION: A full adder is a combinational logic circuit that is used to perform an addition operation on three bits, including two input bits and the feed from the previous addition.

输入和输出：

输入：两个输入比特A和B，以及一个来自前一次加法的进位C。

INPUT: Two input bits A and B, and a rounding bit C${in}$ from a previous addition.

输出：

和（Sum,S）：两个输入比特和进位的和。

进位输出（Carry_out,C）：传递到更高位的进位。

Sum (Sum,S): the sum of the two input bits and the rounding bit.

Progressive output (Carry_out,C${out}$): the progressive bit passed to the higher bit.

真值表：

A	B	C	S	C
0	0	0	0	0
0	1	0	1	0
1	0	0	1	0
1	1	0	0	1
0	0	1	1	0
0	1	1	0	1
1	0	1	0	1
1	1	1	1	1

布尔表达式：

S=A B C（XOR运算）

C=(A B)+(A C)+(B C)

2.串行进位加法器（Ripple Carry Adder,RCA）

定义：串行进位加法器是一种用于执行二进制数加法的加法器，称为“串行”是因为一个比特加法生成的进位会“逐位传递”到后续的加法阶段。

Definition: A serial adders is a type of adder used to perform addition of binary numbers. It is called “serial” because the bits generated by the addition of one bit are passed “bit by bit” to the subsequent stages of addition.

工作原理：

二进制加法：从最低有效位（最右边）开始，逐位对两个二进制数的每一位进行加法运算，每个阶段生成一个和和一个进位。

Binary addition: Each bit of two binary numbers is added bit by bit, starting with the lowest significant bit (rightmost), generating a sum and a rounding bit in each stage.

进位传播：一个阶段中两个比特相加生成的进位会传递到下一个更高位的加法阶段。

Bit Propagation: The rounding bits generated by adding two bits in one stage are passed on to the next higher stage of addition.

结构：通常使用全加器为每个比特位置构建加法器，每个全加器计算两个输入比特和前一个阶段的进位的和。

Structure: Adders are usually constructed for each bit position using full adders, each of which computes the sum of the two input bits and the previous stage's rounding.

总结与复习重点

半加器（Half Adder）：

用于对两个单比特二进制数进行加法运算。

Used to add two single bit binary numbers.

有两个输入 A 和 B 和两个输出和 S 和进位 C 。

There are two inputs A and B and two outputs and S and a rounding bit C .

使用XOR运算计算和，使用AND运算计算进位。

The sum is calculated using the XOR operation and the rounding is calculated using the AND operation.

全加器（Full Adder）：

用于对三个比特（两个输入比特和一个进位输入）进行加法运算。

Used to add three bits (two input bits and a rounding input).

有两个输出和S和进位输出C{out}。

There are two outputs sumS and rounding outputC{out}.

使用XOR运算计算和，使用复杂的布尔表达式计算进位输出。

The sum is computed using the XOR operation and the rounding output is computed using a complex Boolean expression.

串行进位加法器（Ripple Carry Adder,RCA）：

用于执行二进制数的加法。

Used to perform addition of binary numbers.

进位从一个比特位置逐位传递到下一个更高位。

Bits are passed bit by bit from one bit position to the next higher bit.

通常使用全加器构建，每个全加器处理一个比特位置的加法。

Typically constructed using full adders, each of which handles the addition of one bit position.

Performance Factors in Computer Architecture

影响计算机性能的因素

1.CPU核心数（Cores）

定义：CPU由多个处理单元组成，每个处理单元称为一个核心（Core），每个核心包含一个处理器（ALU和CU）和寄存器。

DEFINITION: A CPU consists of multiple processing units, each of which is called a core, and each core contains a processor (ALU and CU) and registers.

影响 Impact：

多任务处理能力：核心数增加，计算机可以同时执行更多程序，提升多任务处理能力。

Multitasking ability: with an increase in the number of cores, the computer can execute more programs at the same time, improving multitasking ability.

性能提升有限：核心数增加并不意味着性能线性提升，因为核心之间的通信也会消耗资源。

Limited performance improvement: an increase in the number of cores does not imply a linear increase in performance, as communication between cores also consumes resources.

核心数与通信通道：

1个核心：无通信通道。

2个核心：需要1个通信通道。

3个核心：需要3个通信通道。

4个核心：需要6个通信通道。

5个核心：需要10个通信通道。

Number of cores and communication channels:

1 core: no communication channel.

2 cores: 1 communication channel is required.

3 cores: 3 communication channels are needed.

4 cores: 6 communication channels are required.

5 cores: 10 communication channels are required.

2.时钟频率（Clock Rate）

定义：时钟频率表示CPU每秒处理的指令数量，以兆赫兹（MHz）或吉赫兹（GHz）为单位。

DEFINITION: Clock frequency indicates the number of instructions processed by the CPU per second, measured in megahertz (MHz) or gigahertz (GHz).

影响：

指令处理速度：更高的时钟频率意味着CPU每秒可以处理更多指令，从而加快任务执行速度。

Instruction processing speed: A higher clock frequency means that the CPU can process more instructions per second, which speeds up task execution.

功耗和散热：更高的时钟频率需要更多的电力，从而产生更多的热量，可能导致过热和数据损坏。

Power consumption and heat dissipation: Higher clock frequencies require more power, which generates more heat and can lead to overheating and data corruption.

超频（Overclocking）：通过BIOS提高CPU的时钟频率，但存在电路限制，超频过度可能导致指令执行不完全，引发数据损坏和过热。

Overclocking: Increasing the CPU's clock frequency through the BIOS, but there are circuit limitations, overclocking may lead to incomplete execution of instructions, data corruption and overheating.

3.缓存大小（Cache Size）

定义：缓存是位于CPU附近的小型存储区域，用于临时存储CPU频繁使用的数据和指令。

A cache is a small storage area located near the CPU for temporary storage of data and instructions that are frequently used by the CPU.

影响：

访问速度：缓存比RAM更靠近CPU，访问速度更快，可以减少CPU等待数据的时间，提高性能。

Access Speed: Caches are closer to the CPU than RAM and have faster access speeds, which can reduce the amount of time the CPU waits for data and improve performance.

缓存级别 Cache level:

L1缓存：每个核心独立拥有，容量小（8 KB到128 KB），访问速度最快。

L1 cache: each core independently owned, small capacity (8 KB to 128 KB), the fastest access speed.

L2缓存：每个核心独立拥有，容量较大（256 KB到2 MB），访问速度稍慢。

L2 cache: Each core has its own cache, with a large capacity (256 KB to 2 MB) and a slightly slower access speed.

L3缓存：多个核心共享，容量最大（8 MB到64 MB或更多），访问速度稍慢。

L3 cache: shared by multiple cores, largest capacity (8 MB to 64 MB or more), slightly slower access speeds.

缓存级别	归属关系	容量范围	访问速度
L1缓存	每个核心独立拥有	8 KB 到 128 KB	访问速度最快
L2缓存	每个核心独立拥有	256 KB 到 2 MB	访问速度稍慢
L3缓存	多个核心共享	8 MB 到 64 MB 或更多	访问速度较慢

Cache Level	Ownership	Capacity Range	Access Speed
L1 Cache	Independently owned by each core	8 KB to 128 KB	Fastest access speed
L2 Cache	Independently owned by each core	256 KB to 2 MB	Slightly slower access speed
L3 Cache	Shared by multiple cores	8 MB to 64 MB or more	Slower access speed

缓存的局限性：缓存容量有限，如果缓存过小，可能会导致性能下降。

Limitations of caches: Cache capacity is limited and may cause performance degradation if the cache is too small.

4.流水线（Pipelining）

定义：流水线是一种实现技术，通过重叠多个指令的处理阶段，提高CPU的整体吞吐量。

Pipelining is an implementation technique that increases the overall throughput of the CPU by overlapping the processing stages of multiple instructions.

工作原理：

无流水线：每个指令依次完成取指令（Fetch）、解码（Decode）和执行（Execute）三个阶段，处理三个指令需要9T时间。

Without pipelining: each instruction completes three stages of Fetch, Decode and Execute in sequence, and it takes 9T to process three instructions.

有流水线：多个指令的处理阶段可以重叠，处理三个指令只需要5T时间，提高了CPU的吞吐量。

With pipelining: the processing phases of multiple instructions can overlap, and it takes only 5T to process three instructions, which improves CPU throughput.

局限性 Limitations: Pipeline hazards 流水线风险

分支指令：如“跳转”指令可能导致流水线失效，需要清空流水线并重新填充，降低流水线的效率。

Branch instructions: e.g. “jump” instructions may cause the pipeline to fail, requiring the pipeline to be emptied and refilled, reducing the efficiency of the pipeline.

5.系统总线（System Bus）

定义：系统总线用于在CPU、内存和I/O设备之间传输数据，其性能直接影响整体系统性能。

Definition: The system bus is used to transfer data between the CPU, memory, and I/O devices, and its performance directly affects overall system performance.

影响因素 Influencing factors:

数据总线宽度：数据总线越宽，一次可以传输的数据量越大，例如32位总线比16位总线传输能力高一倍。

Data Bus Width: The wider the data bus, the more data can be transferred at one time, e.g., a 32-bit bus has twice the transfer capability of a 16-bit bus.

总线速度：总线速度越快，数据传输越快，减少瓶颈，例如400 MHz总线比100 MHz总线更快。

Bus Speed: The faster the bus speed, the faster the data transfer, reducing bottlenecks, e.g. a 400 MHz bus is faster than a 100 MHz bus.

地址总线宽度：地址总线越宽，可以访问的内存空间越大，例如32位地址总线可以访问4 GB内存，64位地址总线可以访问16艾字节内存。

Address Bus Width: The wider the address bus, the more memory space can be accessed, e.g. a 32-bit address bus can access 4 GB of memory and a 64-bit address bus can access 16 Ai bytes of memory.

6.虚拟内存（Virtual Memory）

定义：当RAM不足时，虚拟内存允许系统使用硬盘（HDD/SSD）的一部分作为临时RAM，防止系统崩溃或变慢。

virtual memory allows the system to use a portion of the hard disk (HDD/SSD) as temporary RAM when RAM is low, preventing the system from crashing or slowing down.

影响：

多任务处理：虚拟内存允许同时运行多个程序，即使它们的总内存需求超过可用RAM，提高CPU利用率。

Multitasking: virtual memory allows multiple programs to run simultaneously, even if their total memory requirements exceed available RAM, improving CPU utilization.

性能局限性：虚拟内存比RAM慢，频繁的页面交换（Paging）会降低CPU性能。

Performance Limitations: Virtual memory is slower than RAM, and frequent page swapping (Paging) reduces CPU performance.

总结与复习重点

CPU核心数：核心数增加可以提升多任务处理能力，但性能提升并非线性，核心间的通信也会消耗资源。

CPU core count: Increased core count can improve multitasking, but performance gains are not linear and inter-core communication consumes resources.

时钟频率：更高的时钟频率可以加快指令处理速度，但会增加功耗和热量，可能导致过热和数据损坏。

Clock frequency: Higher clock frequency can speed up instruction processing, but it increases power consumption and heat, which may lead to overheating and data corruption.

缓存大小：缓存可以减少CPU访问数据的时间，提高性能，但缓存容量有限，过大或过小都会影响性能。

Cache size: Cache can reduce the time it takes the CPU to access data and improve performance, but cache capacity is limited, and being too large or too small can affect performance.

流水线：通过重叠指令处理阶段提高CPU吞吐量，但分支指令可能导致流水线失效，降低效率。

Pipeline: CPU throughput is improved by overlapping instruction processing stages, but branching instructions may cause the pipeline to fail and reduce efficiency.

系统总线：总线的宽度和速度直接影响数据传输速度，更宽、更快的总线可以减少瓶颈。

System bus: The width and speed of the bus directly affects the data transfer speed; a wider, faster bus can reduce bottlenecks.

虚拟内存：在RAM不足时使用硬盘空间作为临时RAM，防止系统崩溃，但比RAM慢，频繁页面交换会降低性能。

Virtual Memory: Uses hard disk space as temporary RAM when RAM is insuffic ent to prevent system crashes, but it is slower than RAM and frequent page swapping can reduce performance.

Memory and Storage Systems

Communication and Networking

通信与网络（Communication and Networking）

1.网络的定义与类型

定义：网络是通过通信线路将两台或多台设备连接在一起的系统，文件以数据包的形式在设备之间传输。

Definition : A network is a system that connects two or more devices together over a communication line, where files are transferred between devices in the form of packets.

网络的分类：Classification of Networks:

个人区域网（PAN）：在个人范围内（通常10米内）的网络，如笔记本电脑、手机、媒体播放器等设备之间的连接。

Personal Area Network (PAN): a network within personal range (usually within 10 meters) such as a connection between devices such as laptops, cell phones, and media players.

无线个人区域网（WPAN）：短距离无线网络，用于连接移动计算设备，如智能手机与桌面电脑。

Wireless Personal Area Network (WPAN): a short-range wireless network for connecting mobile computing devices, such as smartphones to desktop computers.

局域网（LAN）：同一建筑物内的计算机网络，不一定连接到互联网。

Local Area Network (LAN): A network of computers within the same building, not necessarily connected to the Internet.

城域网（MAN）：通过电话交换线连接不同局域网，形成更大范围的网络。

Metropolitan Area Network (MAN): A network that connects different LANs over telephone exchange lines to form a larger network.

广域网（WAN）：连接不同地理位置的局域网，如互联网。

Wide Area Network (WAN): A network that connects LANs in different geographic locations, such as the Internet.

虚拟专用网（VPN）：通过加密算法在互联网上为组织创建安全连接。

Virtual Private Network (VPN): Creates a secure connection for an organization over the Internet through encryption algorithms.

存储区域网（SAN）：为用户大规模存储文件提供服务器。

Storage Area Network (SAN): Provides servers for users to store files on a large scale.

网络类型	描述
个人区域网 (PAN)	在个人范围内（通常10米内）的网络，如笔记本电脑、手机、媒体播放器等设备之间的连接。
无线个人区域网 (WPAN)	短距离无线网络，用于连接移动计算设备，如智能手机与桌面电脑。
局域网 (LAN)	同一建筑物内的计算机网络，不一定连接到互联网。
城域网 (MAN)	通过电话交换线连接不同局域网，形成更大范围的网络。
广域网 (WAN)	连接不同地理位置的局域网，如互联网。
虚拟专用网 (VPN)	通过加密算法在互联网上为组织创建安全连接。
存储区域网 (SAN)	为用户大规模存储文件提供服务器。

Network Type	Description
Personal Area Network (PAN)	A network within personal range (usually within 10 meters) connecting devices such as laptops, cell phones, and media players.
Wireless Personal Area Network (WPAN)	A short-range wireless network for connecting mobile computing devices, such as smartphones to desktop computers.
Local Area Network (LAN)	A network of computers within the same building, not necessarily connected to the Internet.
Metropolitan Area Network (MAN)	A network that connects different LANs over telephone exchange lines to form a larger network.
Wide Area Network (WAN)	A network that connects LANs in different geographic locations, such as the Internet.
Virtual Private Network (VPN)	Creates a secure connection for an organization over the Internet through encryption algorithms.
Storage Area Network (SAN)	Provides servers for users to store files on a large scale.

2.网络连接类型

有线连接（Wired connection）：

铜缆：Copper cable:

同轴电缆（Coaxial）：需要定期更换，因为绝缘层可能会退化。

Coaxial cable (Coaxial): needs to be replaced periodically as the insulation may degrade.

屏蔽双绞线（STP）：有箔屏蔽层，可防止电磁干扰。

Shielded Twisted Pair (STP): has a foil shield to prevent electromagnetic interference.

非屏蔽双绞线（UTP）：由4对颜色编码的电线组成，数据传输速度快，电磁干扰小，安装方便。

Unshielded Twisted Pair (UTP): consists of 4 pairs of color-coded wires for fast data transmission, low EMI and easy installation.

光纤电缆（Fibre-optic cables）：由玻璃制成，数据以光信号形式传输，寿命长，电磁干扰少，但安装成本高。

Fiber-optic cables (Fibre-optic cables): made of glass, data transmission in the form of optical signals, long life, less electromagnetic interference, but high installation costs.

无线连接（Wireless connection）：

特点：使用无线网卡（NIC）和无线路由器连接设备，数据以无线电波形式传输。

Use wireless NIC and wireless router to connect devices, data is transmitted in the form of radio wave.

优点：安装便宜且方便，多个设备可以连接。

Advantages: cheap and easy to install, multiple devices can be connected.

缺点：速度较慢，信号受阻时质量下降，存在安全问题。

Disadvantages: slower speeds, degradation of quality when signals are blocked, and security issues.

Feature	Copper Cable	Fiber Optic Cable
Speed	Lower (up to 1 Gbps)	Very high (up to Tbps)
Distance	Short (up to 100 m)	Long (kilometers)
Interference	Prone to interference	Immune to interference
Cost	Low	High
Installation	Easy and cheap	More complex and costly
Durability	Strong and flexible	Fragile
Security	Easy to tap	More secure
Best Use	Short distance, budget	High speed, long distance

3.网络硬件组件

集线器（Hubs）：将数据广播到网络中的所有设备，不检查设备是否需要数据，不使用路由表。

Hubs: broadcasts data to all devices in the network, does not check if a device needs the data, does not use routing tables.

交换机（Switches）：存储网络中设备的MAC地址，根据MAC地址过滤数据包并转发到特定设备，减少不必要的流量。

Switches: Store MAC addresses of devices in the network, filter packets based on MAC addresses and forward them to specific devices, reducing unnecessary traffic.

网桥（Bridges）：连接两个独立的局域网，检查接收方是否已收到数据，避免不必要的数据传输。

Bridges: Connects two separate LANs to check if the receiver has received data and avoid unnecessary data transmission.

调制解调器（Modem）：将数字数据转换为电信号，接收端的调制解调器将电信号还原为数字数据。

Modem: Converts digital data into electrical signals, which the modem at the receiving end reduces to digital data.

路由器（Routers）：在计算机网络之间转发数据包，根据路由表选择最佳路径，决定数据包的传输路径。

Router (Routers): Forwarding data packets between computer networks, selecting the best path according to the routing table and determining the transmission path of the packets.

网关（Gateway）：在网络之间转换数据包，当数据包跨越使用不同协议的网络时使用。

Gateway: Translates packets between networks, used when packets cross networks that use different protocols.

无线接入点（WAP）：允许无线设备通过Wi-Fi等无线标准连接到有线网络。

Wireless Access Point (WAP): Allows wireless devices to connect to a wired network through a wireless standard such as Wi-Fi.

Network Device	Layer	Key Function	Data Forwarding	Collision Domains	Duplex	Security/Performance	Replaced by
Hub	Layer 1	Repeats data to every port (except the receiving one).	Sends data to all ports.	One collision domain for all ports.	Half-duplex (cannot send and receive simultaneously).	Wastes bandwidth; security risks.	Switch
Bridge	Layer 2	Segments networks into smaller sections, uses MAC addresses.	Forwards/discards based on destination MAC address.	Two collision domains.	Half-duplex.	Reduces traffic, improves security compared to hubs.	Switch
Switch	Layer 2	Combines hub and bridge functionality, uses MAC addresses.	Learns MAC addresses, forwards data based on them.	Each port has its own collision domain.	Full-duplex (can send and receive simultaneously).	Saves bandwidth, improves security compared to hubs.	-
Router	Layer 3	Routes traffic between different networks, uses IP addresses.	Forwards traffic based on IP addresses.	-	Full-duplex, highly configurable.	Acts as gateway, highly configurable with advanced features.	-

网络设备	层级	主要功能	数据转发方式	冲突域	双工	安全性/性能	被替代的设备
集线器 (Hub)	层级 1	将数据重复发送到所有端口（接收端口除外）。	将数据发送到所有端口。	所有端口共享一个冲突域。	半双工（不能同时发送和接收）。	浪费带宽，存在安全隐患。	交换机 (Switch)
桥接器 (Bridge)	层级 2	将网络分段，使用 MAC 地址进行转发。	根据目标 MAC 地址转发或丢弃数据。	两个冲突域。	半双工。	降低流量，提高安全性。	交换机 (Switch)
交换机 (Switch)	层级 2	结合了集线器和桥接器的功能，使用 MAC 地址。	学习 MAC 地址，基于地址转发数据。	每个端口都有自己的冲突域。	全双工（每个端口可以同时发送和接收）。	节省带宽，提高安全性。	-
路由器 (Router)	层级 3	路由不同网络间的流量，使用 IP 地址。	根据 IP 地址转发数据。	-	全双工，功能强大且可配置。	作为网关，提供高度配置的功能和高级特性。	-

4.数据包与帧

数据包（Data packets）：

定义：文件在网络上传输时被分割成小块，称为数据包。

Definition: files are divided into small pieces called packets when they are transmitted over a network.

组成：Composition:

头部（Header）：包含发送方和接收方的IP地址、协议类型、数据包编号和数据长度。

Header: contains the IP address, protocol type, packet number and data length of the sender and receiver.

负载（Payload）：实际传输的数据。

Payload: the actual data transmitted.

尾部（Trailer）：包含数据包结束标记和错误校验数据。

Trailer: Contains end-of-packet marker and error check data.

帧（Data frames）：数据包在网络接口卡（NIC）和路由器之间传输时被封装成帧。

Data frames: Data packets are encapsulated into frames when they are transmitted between the NIC and the router.

5.MAC地址与IP地址

MAC地址（Media Access Control Address）：

特点：由制造商分配，不可更改，48位，用12个十六进制字符表示。

Characteristics: Assigned by the manufacturer, unchangeable, 48 bits, expressed in 12 hexadecimal characters.

用途：在局域网内识别设备。

Purpose: To identify a device on a LAN.

IP地址（Internet Protocol Address）：

特点：由网络分配，可以更改，分为IPv4（32位）和IPv6（128位）。

Characteristics: Assigned by the network, changeable, divided into IPv4 (32 bits) and IPv6 (128 bits). Purpose: Identifies devices on the Internet, allowing them to be identified.

用途：在互联网上识别设备，允许设备发送和接收数据。

Purpose: Identifies devices on the Internet and allows them to send and receive data.

6.路由方法

电路交换（Circuit switching）：

特点：两台设备之间建立物理连接，数据以连续流的形式传输，连接结束后释放。

Characteristics: A physical connection is established between the two devices, data is transmitted in a continuous stream and released at the end of the connection.

优点：数据包按发送顺序到达接收端，简化了消息重建过程。

Advantages: packets arrive at the receiver in the order in which they were sent, simplifying the message reconstruction process.

缺点：不允许多个数据包同时传输，电路故障会导致通信中断。

Disadvantages: Multiple packets are not allowed to be transmitted at the same time, and communication can be interrupted by circuit failure.

分组交换（Packet switching）：

特点：数据包独立传输，通过路由器的路由表选择路径，到达目的地后重新组装。

Characteristics: packets are transmitted independently, the path is selected through the router's routing table, and reassembled when they reach their destination.

优点：允许多个数据包同时传输，提高了网络利用率。

Advantage: Allows multiple packets to be transmitted at the same time, improving network utilization.

缺点：数据包可能按不同路径到达，需要在接收端重新排序。

Disadvantage: packets may arrive on different paths and need to be reordered at the receiving end.

切换方式	特点	优点	缺点
电路交换	建立固定通路，数据连续传输	顺序不变，接收端重组简单	一次只能传一条，电路出问题就中断
分组交换	数据分块单独传输，路由自动选择路径	可同时传多条，提高网络利用率	分组顺序可能乱，需要在接收端重新排序

Switching Method	Characteristics	Advantages	Disadvantages
Circuit Switching	Fixed path is set up, data sent continuously	Packets arrive in order, easy to reassemble	Only one connection at a time, failure breaks it
Packet Switching	Data split into packets, each sent separately	Multiple packets sent at once, efficient usage	Packets may arrive out of order, need reordering

7.以太网（Ethernet）

定义：一种有线网络技术，使用双绞线或光纤电缆连接设备，传输速率可达100 Gb/s。

Definition: a wired network technology that uses twisted pair or fiber optic cables to connect devices at transmission rates up to 100 Gb/s.

特点：

网络被划分为多个段，每个段由少数设备共享。

The network is divided into segments, each shared by a small number of devices.

数据包被分割成帧，帧中包含源和目标MAC地址。

Packets are segmented into frames that contain source and destination MAC addresses.

接收端检查帧中的错误，如果发现错误，请求发送端重新发送数据包。

The receiver checks for errors in the frame and requests the sender to resend the packet if an error is found.

帧在以太网中广播，只有目标地址匹配的设备才会接收帧。

The frames are broadcasted over the Ethernet and only devices with matching destination addresses receive the frames.

局限性：

标准以太网电缆的最大有效范围为100米，超过此距离需要使用光纤，增加成本。

Standard Ethernet cables have a maximum effective range of 100 meters, beyond which fiber optics are required, increasing costs.

半双工以太网（数据单向传输）可能导致频繁碰撞和延迟，全双工以太网（数据双向传输）减少了这一问题。

Half-duplex Ethernet (unidirectional transmission of data) can lead to frequent collisions and delays, which are reduced with full-duplex Ethernet (bidirectional transmission of data).

随着连接设备数量的增加，以太网的效率可能会降低。

As the number of connected devices increases, the efficiency of Ethernet may decrease.

8.Wi-Fi

定义：一种无线网络技术，允许各种设备（如笔记本电脑、平板电脑、智能手机等）通过无线电波连接到网络。

Definition: a wireless networking technology that allows a variety of devices (e.g., laptops, tablets, smartphones, etc.) to connect to a network via radio waves.

特点：

使用2.4 GHz和5 GHz的无线电波频率。

Uses radio wave frequencies of 2.4 GHz and 5 GHz.

信号范围可达20米，但墙壁厚度会影响信号强度。

Signal range is up to 20 meters, but wall thickness can affect signal strength.

一个Wi-Fi接入点的带宽在多个设备之间共享，可能导致性能下降。

The bandwidth of a Wi-Fi access point is shared between multiple devices, which can lead to performance degradation.

安全性：

无线网络容易受到安全威胁，需要使用加密技术保护数据。

Wireless networks are vulnerable to security threats and require the use of encryption to protect data.

常见的加密方法包括Wi-Fi保护访问（WPA）和Wi-Fi保护访问II（WPA2）。

Common encryption methods include Wi-Fi Protected Access (WPA) and Wi-Fi Protected Access II (WPA2).

WPA2使用高级加密标准（AES）加密数据，每个数据包都生成一个新的128位密钥。

WPA2 encrypts data using the Advanced Encryption Standard (AES), which generates a new 128-bit key for each packet.

网络管理员可以设置基于MAC地址的白名单，控制设备的访问权限。

Network administrators can set up a MAC address-based whitelist to control access to devices.

特性	以太网 (Ethernet)	无线网 (Wi-Fi)
速度	稳定高速，最高可达10Gbps	速度受环境影响，最高约几Gbps
连接稳定性	非常稳定	容易受干扰，波动较大
传输距离	受限于线缆长度，一般100米	通常几十米，受墙壁等阻挡影响
安装复杂度	需要布线，安装较复杂	方便快捷，无需布线
移动性	固定连接	支持移动设备连接
成本	需要布线，成本相对较高	设备成本较低，使用方便
安全性	高，物理隔离好	需要加强加密和认证
适用场景	需要高速稳定连接的场所	适合移动和便捷访问

Feature	Ethernet	Wi-Fi
Speed	Stable and fast (up to 10 Gbps)	Variable, up to several Gbps
Connection	Very stable	Prone to interference and fluctuations
Range	Limited by cable length (~100 m)	Usually tens of meters, affected by walls
Installation	Requires wiring, more complex	Easy and quick, no wiring needed
Mobility	Fixed connection	Supports mobile devices
Cost	Higher due to cabling	Lower equipment cost
Security	High, physical isolation	Needs strong encryption and authentication
Best Use	High-speed, stable needs	Convenience and mobility

总结与复习重点

网络类型：PAN、WPAN、LAN、MAN、WAN、VPN、SAN等。

Network types: PAN, WPAN, LAN, MAN, WAN, VPN, SAN, etc.

连接类型：有线（铜缆、光纤）和无线（无线电波）。

Connection types: wired (copper, fiber) and wireless (radio waves).

网络硬件：集线器、交换机、网桥、调制解调器、路由器、网关、无线接入点。

Network hardware: hubs, switches, bridges, modems, routers, gateways, wireless access points.

数据包与帧：数据包由头部、负载和尾部组成，帧在网络中传输。

Packets and Frames: Packets consist of a header, a load, and a tail, and frames are transmitted across the network.

MAC地址与IP地址：MAC地址用于局域网内识别设备，IP地址用于互联网上识别设备。

MAC Addresses and IP Addresses: MAC addresses are used to identify devices on the LAN and IP addresses are used to identify devices on the Internet.

路由方法：电路交换和分组交换。

Routing methods: circuit-switched and packet-switched.

以太网：有线网络技术，使用双绞线或光纤电缆连接设备。

Ethernet: Wired network technology that uses twisted pair or fiber optic cables to connect devices.

Wi-Fi：无线网络技术，使用无线电波连接设备，需要加密保护数据。

Wi-Fi: Wireless network technology, uses radio waves to connect devices, requires encryption to protect data.

Network Topology

网络拓扑（Network Topology）

1.网络拓扑的定义

定义：网络拓扑是指网络中各种设备（如计算机、路由器、交换机等）的物理或逻辑布局，以及它们之间的连接方式。

Definition: Network topology refers to the physical or logical layout of the various devices in a network (e.g., computers, routers, switches, etc.) and how they are connected to each other.

重要性：网络拓扑影响网络的性能、安全性和可扩展性，是网络设计和管理中的关键概念。

Importance: Network topology affects the performance, security, and scalability of a network and is a key concept in network design and management.

2.网络拓扑的类型

点对点拓扑（Point-to-Point Topology）

特点：最简单的基本网络拓扑，由两个节点通过单一链路连接，数据在两个端点之间来回传输。

Characteristics: The simplest basic network topology, consisting of two nodes connected by a single link, with data traveling back and forth between the two endpoints.

优点：易于设置。

Advantages: Easy to set up.

缺点：由于其简单性，在现代网络中的使用受到限制。

Disadvantage: Its simplicity limits its use in modern networks.

总线拓扑（Bus Topology）

特点：所有节点连接到一根主电缆（称为总线或主干），数据沿电缆双向传输。

Characteristics: All nodes are connected to a single main cable (called a bus or backbone), and data is transmitted in both directions along the cable.

优点：易于安装，成本低，因为所需电缆较少。

Advantages: easy to install and low cost because fewer cables are required.

缺点：

主电缆故障会导致整个网络瘫痪。

Failure of the main cable can paralyze the entire network.

随着连接的计算机数量增加，数据碰撞增加，导致连接速度变慢。

As the number of computers connected increases, data collisions increase, causing the connection to slow down.

数据对网络中的所有设备都可见，安全性低。

Data is visible to all devices in the network and is less secure.

环形拓扑（Ring Topology）

特点：节点以闭环配置连接，每个节点恰好有两个邻居，数据在一个方向上流动（双环系统可以双向传输）。

Characteristics: nodes are connected in a closed-loop configuration where each node has exactly two neighbors and data flows in one direction (a two-ring system can transmit in both directions).

优点：

数据碰撞减少，因为数据沿一个方向流动。

Data collisions are reduced because data flows in one direction.

双环系统可以提供冗余，防止单点故障。

A dual-loop system provides redundancy and prevents a single point of failure.

缺点：

单个节点故障可能导致整个网络瘫痪。

A single node failure can bring down the entire network.

添加或移除节点较为复杂。

Adding or removing nodes is complicated.

星形拓扑（Star Topology）

特点：所有节点连接到一个中央集线器（或交换机），节点围绕中央集线器排列，形成类似星形的结构。

Characteristics: All nodes are connected to a central hub (or switch), and the nodes are arranged around the central hub to form a star-like structure.

优点：

单个节点故障不会影响其他节点。

Failure of a single node does not affect other nodes.

添加或移除设备相对容易，可扩展性强。

It is relatively easy to add or remove devices and is highly scalable.

易于管理和故障排除，适合局域网（LAN）。

Easy to manage and troubleshoot, suitable for local area networks (LANs).

缺点：

安装成本高，因为需要安装集线器/交换机。

High installation costs because hubs/switches need to be installed.

中央集线器/交换机故障会导致整个网络瘫痪。

Failure of the central hub/switch can bring down the entire network.

网状拓扑（Mesh Topology）

特点：每个节点直接连接到多个其他节点，形成高度互联的网络结构。

Characteristics: Each node is directly connected to multiple other nodes to form a highly interconnected network structure.

数据传输方式：Data transmission method:

广播方式：将数据包发送到所有设备，目标接收器会拾取数据包。

Broadcast method: sends packets to all devices and the target receiver picks up the packets.

路由方式：将数据包路由到特定设备。

Routing method: routes the packet to a specific device.

优点：

高度冗余，单点故障不会影响整个网络。

Highly redundant, a single point of failure will not affect the entire network.

可扩展性强，易于添加更多设备。

Highly scalable, easy to add more devices.

数据传输直接，安全性高。

Data transmission is direct and highly secure.

缺点：

网络设计和管理复杂。

Complex network design and management.

实施和维护成本高，尤其是大型网络中的全网状拓扑。

High implementation and maintenance costs, especially for full mesh topologies in large networks.

混合拓扑（Hybrid Topology）

特点：结合了两种或多种拓扑结构的元素，以满足特定需求。

Characteristics: Combines elements of two or more topologies to meet specific needs.

优点：可以根据特定用例和业务需求定制高效的网络架构。

Advantage: Efficient network architecture can be customized to meet specific use cases and business requirements.

缺点：创建定制的网络架构可能具有挑战性，需要更多的电缆和网络设备，增加维护成本。

Cons: Creating customized network architectures can be challenging, requiring more cabling and network equipment and increasing maintenance costs.

3.网络拓扑的比较

拓扑类型	结构	可靠性	成本	可扩展性	性能
总线拓扑	单根主电缆	低（单点故障）	低	困难（添加设备会降低性能）	慢（共享带宽）
星形拓扑	中央集线器/交换机连接设备	中等（集线器故障影响所有设备）	中等	容易（可以轻松添加更多设备）	好（专用链路）
环形拓扑	设备以闭环配置连接	中等（故障会中断网络，除非是双环）	中等	中等（添加/移除节点困难）	好（数据流动可预测）
网状拓扑	每个设备连接到其他所有设备	高（多条路径防止故障）	高（需要许多电缆/节点）	好（无线网状网络更容易）	优秀（无拥塞，快速路由）
混合拓扑	结合两种或多种拓扑	可变（取决于组合）	可变（取决于拓扑组合）	高（可定制）	高（根据需求优化）

Topology Type	Architecture	Reliability	Cost	Scalability	Performance
Bus Topology	Single main cable	Low (single point of failure)	Low	Poor (performance decreases with added devices)	Slow (shared bandwidth)
Star Topology	Central hub/switch connects devices	Medium (hub failure affects network)	Medium	High (easy device addition)	Good (dedicated links)
Ring Topology	Closed-loop device configuration	Medium (vulnerable unless dual-loop)	Medium	Limited (complex node changes)	Good (predictable data flow)
Mesh Topology	All devices interconnected	High (multiple redundant paths)	High (extensive cabling)	Good (especially wireless mesh)	Excellent (minimal congestion)
Hybrid Topology	Multiple topology combination	Varies by design	Varies by design	High (customizable)	High (needs-optimized)

总结与复习重点

网络拓扑：网络中设备的物理或逻辑布局。

Network Topology: The physical or logical layout of devices in a network.

总线拓扑：所有设备连接到一根主电缆，易于安装，但可靠性低。

Bus Topology: All devices are connected to a single main cable, easy to install but less reliable.

环形拓扑：设备以闭环配置连接，数据沿一个方向流动，双环系统提供冗余。

Ring topology: Devices are connected in a closed-loop configuration, with data flowing in one direction and redundancy provided by a two-ring system.

星形拓扑：所有设备连接到一个中央集线器，易于管理和扩展，但成本较高。

Star topology: All devices are connected to a centralized hub, easy to manage and expand, but costly.

网状拓扑：每个设备连接到多个其他设备，高度冗余，但设计和维护复杂。

Mesh topology: Each device is connected to multiple other devices, highly redundant, but complex to design and maintain.

混合拓扑：结合多种拓扑结构，可根据需求定制，但实施和维护成本高。

Hybrid topology: Combines multiple topologies and can be customized to meet your needs, but is costly to implement and maintain.

Big Data

1.大数据的定义

定义：大数据是指数据集的规模巨大，传统数据处理工具无法存储、操作或分析这些数据。

Definition: big data refers to data sets that are so large that traditional data processing tools cannot store, manipulate, or analyze them.

特点：

体量（Volume）：数据量巨大，单个服务器无法处理，需要多个服务器存储和分析。

Volume: The volume of data is so large that it cannot be handled by a single server and requires multiple servers to store and analyze it.

速度（Velocity）：数据实时更新，特别是在流数据中，数据更新速度极高。

Velocity: data is updated in real time, especially in streaming data, the data is updated at a very high speed.

多样性（Variety）：数据格式多样，通常是非结构化的，如文本、多媒体文件等，难以用传统算法提取有用信息。

Variety: Data formats are diverse, usually unstructured, such as text, multimedia files, etc., which makes it difficult to extract useful information with traditional algorithms.

扩展的5V模型：

Extended 5V model:

真实性（Veracity）：数据的质量和可靠性。

Veracity: the quality and reliability of data.

价值（Value）：从数据中提取的有用信息。

Value: useful information extracted from data.

V	Description
Volume	Extremely large data requiring distributed handling
Velocity	Real‑time, high‑speed data updates
Variety	Diverse, often unstructured formats
Veracity	Quality and reliability of the data
Value	Useful insights and benefits extracted from the data

其他扩展维度：可变性（Variability）、可视化（Visualization）、易失性（Volatility）、脆弱性（Vulnerability）、有效性（Validity）。

Other extended dimensions: Variability, Visualization, Volatility, Vulnerability, Validity.

2.大数据的应用示例

医疗研究：分析患者记录、基因数据和临床试验，以改进治疗、预测疾病爆发和个性化医疗。

Medical research: Analyzing patient records, genetic data and clinical trials to improve treatment, predict disease outbreaks and personalize medicine.

金融服务和银行业：用于欺诈检测、风险评估、客户画像和实时交易监控，增强安全性和决策能力。

Financial services and banking: for fraud detection, risk assessment, customer profiling and real-time transaction monitoring to enhance security and decision-making.

电子商务网站：利用大数据进行个性化推荐、客户行为分析、库存管理和目标营销策略。

E-commerce sites: for personalized recommendations, customer behavior analysis, inventory management and targeted marketing strategies using big data.

政府：用于公共安全、交通管理、税务欺诈检测和政策制定，通过分析公民数据、社会趋势和经济模式。

Government: for public safety, traffic management, tax fraud detection and policy development by analyzing citizen data, social trends and economic patterns.

3.大数据的挑战

存储与管理：高效处理海量数据需要先进的基础设施。

Storage and management: Efficient handling of massive amounts of data requires advanced infrastructure.

数据质量与准确性：确保数据清洁、一致和可靠是一个挑战。

Data Quality & Accuracy: Ensuring that data is clean, consistent and reliable is a challenge.

处理与速度：实时分析大型数据集需要强大的计算能力。

Processing & Speed: Analyzing large data sets in real time requires powerful computing capabilities.

安全与隐私：保护数据免受网络威胁并确保符合法规至关重要。

Security & Privacy: Protecting data from cyber threats and ensuring regulatory compliance is critical.

集成与可扩展性：结合来自不同来源的数据并随着数据增长保持系统性能是复杂的。

Integration & Scalability: Combining data from disparate sources and maintaining system performance as data grows is complex.

4.事实基础建模（Fact-based Modelling）

定义：在数据仓库中，使用事实基础模型来定位数据，数据存储时带有时间戳，且数据不可删除，持续增长。

Definition: in a data warehouse, data is located using a fact base model, data is stored with a timestamp, and data is not deletable and continues to grow.

特点：

不可变性：由于使用时间戳，数据一旦存储就不可更改。

Immutability: due to the use of timestamps, data is immutable once stored.

适合大数据：这种模型适合大数据，因为数据变化通过时间戳区分。

Suitable for Big Data: This model is suitable for big data because data changes are distinguished by timestamps.

图模式（Graph Schema）：

定义：由于原始数据量大，信息以图形形式存储，称为图模式，它是一种基于图形的数据类型。

Definition: Due to the large amount of raw data, the information is stored in graphical form called graph schema, which is a graph-based data type.

关系表示：图模式不仅表示事实，还表示事实之间的关系，节点和边（关系）都有属性。

Relational Representation: Graph Schema represents not only facts but also relationships between facts, nodes and edges (relationships) have attributes.

优势：与在数据库中搜索相比，使用图模式进行数据处理更快，例如，查找某人的朋友的朋友更容易通过遍历图来实现。

Advantage: Data processing is faster using a graph schema than searching in a database, for example, finding the friends of someone's friends is more easily accomplished by traversing the graph.

5.大数据的分布式处理

定义：由于数据量大和交易速度高，单个服务器无法处理，需要通过网络连接多台服务器，并将处理过程分配给它们，这种处理方式称为分布式处理。

Definition: Due to the large volume of data and high transaction speeds, a single server cannot handle it and multiple servers need to be connected through a network and the processing is assigned to them, this type of processing is called distributed processing.

主从架构：一台主计算机负责操作系统和专业软件，通过专用网络连接其他计算机，该网络也可用于云计算。

Master-Slave Architecture: A master computer is responsible for the operating system and specialized software and is connected to other computers through a dedicated network which can also be used for cloud computing.

函数式编程：Functional Programming:

特点：

无副作用：函数式编程中，函数的输出仅依赖于输入，不依赖外部状态，避免了分布式处理中的同步问题。

No side effects: in functional programming, the output of a function depends only on the input and does not depend on the external state, avoiding synchronization problems in distributed processing.

高阶函数：支持将函数作为参数传递或返回函数的特性，例如Python中的apply_function。

Higher-order functions: supports features that pass or return functions as arguments, such as apply_function in Python.

不可变对象：一旦创建，对象的状态不能更改，这有助于分布式处理，因为函数多次调用返回相同结果，且函数执行顺序不影响结果。

Immutable objects: once created, the state of an object cannot be changed, which helps distributed processing because multiple calls to a function return the same result and the order in which functions are executed does not affect the result.

总结与复习重点

大数据的定义：难以存储和分析的大型数据集，具有体量、速度和多样性等特点。

Definition of big data: large data sets that are difficult to store and analyze, characterized by volume, velocity and diversity.

大数据的应用：医疗、金融、电商和政府等领域。

Applications of big data: healthcare, finance, e-commerce and government.

事实基础建模：使用时间戳存储数据，适合大数据处理，图模式用于表示数据及其关系。

Fact-based modeling: use of timestamps to store data, suitable for big data processing, graph patterns are used to represent data and their relationships.

分布式处理：通过网络连接多台服务器处理大数据，函数式编程因其无副作用、高阶函数和不可变对象的特性，适合分布式处理。

Distributed Processing: Processing big data by connecting multiple servers over a network, Functional Programming is suitable for distributed processing due to its no side effects, higher order functions and immutable objects.

Machine Learning

机器学习的类型

• 监督学习（Supervised Learning）：

• 定义：给定标记过的训练数据和期望的输出（标签），模型通过学习这些数据来预测未来的输出。

Definition: Given labeled training data and desired output (labels), the model learns this data to predict future output.

• 分类（Classification）：使用算法将测试数据准确地分配到特定类别中。

Classification: The use of algorithms to accurately assign test data to specific categories.

• 回归（Regression）：用于理解因变量和自变量之间的关系，常用于预测，如销售预测。

Regression: Used to understand the relationship between dependent and independent variables, often used in forecasting, e.g., sales prediction.

• 无监督学习（Unsupervised Learning）：

• 定义：给定未标记的数据，模型从中发现模式，解决聚类或关联问题。

Definition: given unlabeled data, the model discovers patterns from it to solve clustering or association problems.

• 应用示例：

• 市场细分。

• 社交网络分析。

• 天文数据分析。

• 组织计算集群。

• 独立成分分析：将混合信号分离成原始信号。

Market segmentation.

Social network analysis.

Astronomical data analysis.

Organizational computing clusters.

Independent Component Analysis: Separation of mixed signals into original signals.

• 强化学习（Reinforcement Learning）：

• 定义：基于奖励期望行为和/或惩罚不期望行为的机器学习训练方法。

Definition: a machine learning training method based on rewarding desired behaviors and/or punishing undesired behaviors.

• 特点：强化学习代理能够感知和解释环境，采取行动并通过试错学习。

CHARACTERISTICS: Reinforcement learning agents are able to perceive and interpret the environment, take action and learn through trial and error.

• 应用示例：

• 游戏玩法。

• 迷宫中的机器人。

• 用手平衡一根杆。

• 信用分配问题。

Game play.

Robots in a maze.

Balance a bar by hand.

Credit allocation problems.

8. 监督学习与无监督学习的对比

• 监督学习：给定输入 X ，预测输出 Y 。训练集是一组带有正确输入输出对的例子，数据带有标签。

• Supervised learning: given input X , predict output Y . The training set is a set of examples with correct input-output pairs, and the data is labeled.

• 无监督学习：给定输入 X ，预测输出 Y 。数据没有标签，例如生成与给定图像相似的图像，或生成与给定诗歌相似的诗歌。

• Unsupervised learning: given input X , predict output Y . The data is unlabeled, e.g., generating images similar to a given image, or generating poems similar to a given poem.

Deep Learning

深度学习简介

定义：深度学习是机器学习的一个分支，使用大量数据来教计算机执行只有人类才能完成的任务。

Definition: deep learning is a branch of machine learning that uses large amounts of data to teach computers to perform tasks that only humans can accomplish.

目的：使计算机能够解决感知问题，如视觉模式识别。

Purpose: To enable computers to solve perceptual problems, such as visual pattern recognition.

深度学习的原因

传统机器学习算法的局限性：Limitations of traditional machine learning algorithms:

不擅长处理高维数据。

Not good at handling high dimensional data.

特征提取和对象识别困难。

Difficulty in feature extraction and object recognition.

深度学习的优势：Advantages of Deep Learning:

虽然计算成本高，但能处理高维数据。

Can handle high dimensional data despite high computational cost.

自动进行特征提取。

Automates feature extraction.

深度学习的工作原理

人工神经网络（ANNs）：模仿人脑功能的计算系统。

Artificial Neural Networks (ANNs): computing systems that mimic the functions of the human brain.

深度神经网络（DNNs）：由多个互连的人工神经元或节点组成，分为输入层、隐藏层和输出层。

Deep Neural Networks (DNNs): consist of multiple interconnected artificial neurons or nodes divided into input, hidden and output layers.

人工神经元的结构：每个神经元对其输入分配权重，最终输出由这些权重总和决定。

Structure of an artificial neuron: each neuron assigns weights to its inputs and the final output is determined by the sum of these weights.

深度学习的训练过程

训练数据：输入和目标输出。

Training data: input and target output.

训练步骤：Training steps:

初始化随机权重。

Initialize random weights.

通过网络获取输出。

Obtain the output through the network.

将预测结果与真实值比较，并根据误差调整权重。

Compare the predicted results with the true values and adjust the weights according to the error.

5.深度学习的挑战

数据需求：需要大量标记数据，难以收集且成本高。

Data Requirements: Requires large amounts of labeled data, which is difficult and costly to collect.

计算资源：需要高性能GPU，资源密集且昂贵。

Computational resources: High-performance GPUs are required, which is resource-intensive and expensive.

模型可解释性：深度学习模型通常被视为“黑盒”，难以理解其决策过程。

Model Interpretability: Deep learning models are often viewed as “black boxes”, making it difficult to understand their decision-making process.

常见的神经网络类型

前馈神经网络（FNN）：应用于基本分类和回归任务。

Feedforward Neural Networks (FNN): applied to basic classification and regression tasks.

卷积神经网络（CNN）：应用于图像识别、对象检测等。

Convolutional Neural Network (CNN): applied to image recognition, object detection, etc.

循环神经网络（RNN）：应用于时间序列预测、语言建模等。

Recurrent Neural Network (RNN): applied to time series prediction, language modeling, etc.

生成对抗网络（GAN）：应用于图像生成、视频生成等。

Generative Adversarial Network (GAN): applied to image generation, video generation, etc.

长短期记忆网络（LSTM）：应用于机器翻译、语音识别等。

Long Short-Term Memory Network (LSTM): applied to machine translation, speech recognition, etc.

变换器网络：应用于机器翻译、文本摘要等。

Transformer networks: applied to machine translation, text summarization, etc.

深度学习的应用示例

自动为黑白图像着色。

Automatically colorize black and white images.

为无声电影自动添加声音。

Automatically add sound to silent movies.

自动机器翻译。

Automatic machine translation.

照片中的对象分类和检测。

Object classification and detection in photos.

自动手写生成。

Automatic handwriting generation.

CSA Final Note

Machine Learning

Qi Liu

Contact Information