c icon indicating copy to clipboard operation
c copied to clipboard

c programming basics and use cases

#+TITLE: C Lessons #+AUTHOR: Junjie Mars #+STARTUP: overview #+OPTIONS: num:nil toc:nil #+REVEAL_HLEVEL: 2 #+REVEAL_SLIDE_NUMBER: h #+REVEAL_THEME: moon #+BEGIN_COMMENT #+REVEAL_TRANS: cube #+REVEAL_MARGIN: 0.1 #+REVEAL_MIN_SCALE: 0.2 #+REVEAL_MAX_SCALE: 1.5 #+END_COMMENT

@@html:clang, gcc and msvc@@

  • Quick start :PROPERTIES: :CUSTOM_ID: quick-start :END:

#+ATTR_HTML: :style text-align:left It is not /C Lessons/ at all :). I'd programming in C long time ago, sometimes I want to pick something up, but I cannot find the peice of code somewhere or cannot run the code written in another machine.

#+REVEAL: split #+ATTR_HTML: :style text-align:left Sadly, old dog always need to learn something new.

  • Access the code from anywhere, oh, GitHub is good one
  • Run or write code on anywhere, so Linux, Darwin, or Windows, Docker Box
  • Easy to try and learn

#+ATTR_HTML: :style text-align:left Now, we had [[https://github.com/junjiemars/nore][Nore]], something changed and something not.

#+REVEAL: split Let's start ...

#+BEGIN_SRC sh

bootstrap Nore

curl https://raw.githubusercontent.com/junjiemars/nore/master/bootstrap.sh -sSfL | sh

configure -> make -> test -> install

./configure --has-hi make make test make install #+END_SRC

  • Language :PROPERTIES: :CUSTOM_ID: language :END:

Run the example under =src/lang=. #+BEGIN_SRC sh ./configure --has-lang make clean test #+END_SRC

** Preprocessor :PROPERTIES: :CUSTOM_ID: language_preprocessor :END:

The /preprocessor/ runs first, as the name implies. It performs some text manipulations, such as:

  • stripping comments
  • resolving =#include= directives and replacing them with the contents of the included file
  • =#include_next= directives does not distinguish between == and ="file"= inclusion, just look the file in the search path
  • evaluating =#if= and =#ifdef= directives
  • evaluating =#define=
  • expanding the macros found in the rest of the code according to those =#define=

#+BEGIN_SRC sh ./configure --lang make clean lang_preprocessor_test #+END_SRC

*** =#ident= :PROPERTIES: :CUSTOM_ID: language_preprocessor_ident :END:

*** =#include= :PROPERTIES: :CUSTOM_ID: language_preprocessor_include :END:

The =#include= directive instructs the preprocessor to paste the text of the given file into the current file. Generally, it is necessary to tell the preprocessor where to look for header files if they are not placed in the current directory or a standard system directory.

*** =#define= :PROPERTIES: :CUSTOM_ID: language_preprocessor_define :END:

The =#define= directive takes two forms: defining a /constant/ or creating a /macro/.

  • Defining a /constant/ #+BEGIN_SRC c #define identifier [value] #+END_SRC

When defining a /constant/, you may optionally elect not to provide a value for that constant. In this case, the /identifier/ will be replaced with blank text, but will be "defined" for the purposes of =#ifdef= and =ifndef=. If a value is provided, the given token will be replaced literally with the remainder of the text on the line. You should be careful when using =#define= in this way.

  • Defining a /parameterized macro/ #+BEGIN_SRC c #define identifier( [, ...]) statement #define max(a, b) ((a) > (b) ? (a) : (b)) #+END_SRC

*** =#undef= :PROPERTIES: :CUSTOM_ID: language_preprocessor_undef :END:

#+BEGIN_SRC c #undef identifier #+END_SRC

The =#undef= directive undefines a constant or macro that defined previously using =#define=.

For example: #+BEGIN_SRC c #define E 2.71828 double e_squared = E * E; #ifdef E

undef E

#endif #+END_SRC

Usually, =#undef= is used to scope a preprocessor constant into a very limited region: this is done to avoid leaking the constant. =#undef= is the only way to create this scope since the preprocessor does not understand block scope.

*** =#if= vs. =#ifdef= :PROPERTIES: :CUSTOM_ID: language_preprocessor_if_vs_ifdef :END:

=#if= check the value of the symbol when the symbol had been defined, =#ifdef= just check the existence of the symbol.

Prefer =#if defined(...)=, it's more flexible #+BEGIN_SRC c #if defined(LINUX) || defined(DARWIN) /* code: when on LINUX or DARWIN platform */ #endif

#if defined(CLANG) && (1 == NM_CPU_LITTLE_ENDIAN) /* code: when using clang compiler and on a little endian machine */ #endif #+END_SRC

*** =#ifndef= :PROPERTIES: :CUSTOM_ID: language_preprocessor_ifndef :END:

#+BEGIN_SRC c #ifndef identifer /* code: when the identifier had not been defined */ #endif #+END_SRC

=#ifndef= checks whether the given identifier has been =#defined= earlier in the file or in an included file; if not, it includes the code between it and the closing =#else= or, if no =#else= is present, =#endif= statement. =#ifndef= is often used to make header files idempotent by defining a identifier once the file has been included and checking that the identifier was not set at the top of that file.

#+BEGIN_SRC c #ifndef LANG_H

define LANG_H

#endif #+END_SRC

=#if !defined(identifier)= is equivalent to =#ifndef identifier=

#+BEGIN_SRC c #if !defined(min)

define min(a, b) ((a) < (b) ? (a) : (b))

#endif #+END_SRC

*** =#error= :PROPERTIES: :CUSTOM_ID: language_preprocessor_error :END:

#+BEGIN_SRC c #error "[description]" #+END_SRC

The =#error= macro allows you to make compilation fail and issue a statement that will appear in the list of compilation errors. It is most useful when combined with =#if/#elif/#else= to fail compilation if some condition is not true. For example:

#+BEGIN_SRC c #if (1 == ERROR)

error "compile failed: because ERROR == 1 is true"

#endif #+END_SRC

*** =#pragma= :PROPERTIES: :CUSTOM_ID: language_preprocessor_pragma :END:

The =#pragma= directive is used to access compiler-specific preprocessor extensions.

A common use of =#pragma= is the =#pragma once= directive, which asks the compiler to include a header file only a single time, no matter how many times it has been imported.

#+BEGIN_SRC c #pragma once /* header file code */

/* #pragma once is equivalent to */ #ifndef FILE_NAME_H

define FILE_NAME_H

/* header file code */ #endif #+END_SRC

The =#pragma= directive can also be used for other compiler-specific purposes. =#pragma= is commonly used to suppress warnings.

#+BEGIN_SRC c #if (MSVC)

pragma warning(disable:4706) /* assignment within conditional expression */

pragma comment(lib, "Ws2_32.lib") /* link to Ws2_32.lib */

#elif (GCC)

pragma GCC diagnostic ignored "-Wstrict-aliasing" /* (unsigned*) &x */

#elif (CLANG)

pragma clang diagnostic ignored "-Wparentheses"

#endif #+END_SRC

*** =FILE= :PROPERTIES: :CUSTOM_ID: language_preprocessor_file :END:

  • =FILE= expands to full path to the current file
  • =LINE= expands to current line number in the source file, as an integer
  • =DATE= expands to current date at compile time in the form =Mmm dd yyyy= as a string, such as "Oct 26 2021"
  • =TIME= expands to current time at compile time in the form =hh:mm:ss= in 24 hour time as a string, such as "16:08:17"
  • =TIMESTAMP= expands to current time at compile time in the form =Ddd Mmm Date hh::mm::ss yyyy= as a string, where the time is in 24 hour time, =Ddd= is the abbreviated day, =Mmm= is the abbreviated month, =Date= is the current day of the month (1-31), and =yyyy= is the four digit year, such as "Tue Oct 26 12:42:21 2021"
  • =func= expands to the function name as part of C99

** main :PROPERTIES: :CUSTOM_ID: language_main :END:

** exit :PROPERTIES: :CUSTOM_ID: language_exit :END:

Most C programs call the library routine =exit=, which flushes buffers, closes streams, unlinks temporary files, etc., before calling =_exit=.

** assert :PROPERTIES: :CUSTOM_ID: language_assert :END:

No, there's nothing wrong with =assert= as long as you use it as intended.

  • assert: a failure in the program's logic itself.
  • error: an erroneous input or system state not due to a bug in the program.

Assertions are primarily intended for use during debugging and are generally turned off before code is deployed by defining the =NDEBUG= macro.

#+BEGIN_SRC sh

with assert

./configure --has-lang make clean lang_assert_test

erase assertions: simple way

./configure --has-lang --with-release=yes make clean lang_assert_test #+END_SRC

An /assertion/ specifies that a program statisfies certain conditions at particular points in its execution. There are three types of assertion:

  • preconditions: specify conditions at the start of a function.
  • postconditions: specify conditions at the end of a function.
  • invariants: specify conditions over a defined region of a program.

The =static_assert= macro, which expands to the =Static_assert=, a keyword added in C11 to provide compile-time assertion.

** enum :PROPERTIES: :CUSTOM_ID: language_enum :END:

#+BEGIN_SRC c enum [identifier] { enumerator-list };

enumerator = constant-expression; #+END_SRC

=enumerator-list= is a comma-separated list, tailing comma permitted since C99, =identifier= is optional. If =enumerator= is followed by /constant expression/, its value is the value of that /constant expression/. If =enumerator= is not followed by /constant-expression/, its value is the value one greater than the value of the previous enumerator in the same enumeration. The value of the first enumerator if it does not use /constant-expression/ is zero.

Unlike =struct= and =union=, there are no forward-declared =enum= in C.

** Error :PROPERTIES: :CUSTOM_ID: language_error :END:

  • /fail safe/ pertaining to a system or component that automatically places itself in a safe operating mode in the event of a failue: a traffic light that reverts to blinking red in all directions when normal operation fails.
  • /fail soft/ pertaining to a system or component that continues to provide partial operational capability in the event of certain failues: a traffic light that continues to alternate between red and green if the yellow light fails. A static variable =errno= indicating the error status of a function call or object. These indicators are /fail soft/.
  • /fail hard/ aka fail fast or fail stop. The reaction to a detected fault is to immediately halt the system. Termination is /fail hard/.

*** errno :PROPERTIES: :CUSTOM_ID: language_error_errno :END:

Before C11, =errno= was a global variable, with all the inherent disadvantages:

  • later system calls overwrote earlier system calls;
  • global map of values to error conditions (=ENOMEM=, =ERANGE=, etc);
  • behavior is underspecified in ISO C and POSIX;
  • technically =errno= is a /modifiable lvalue/ rather than a global variable, so expressions like =&errno= may not be well-defined;
  • thread-unsafe;

In C11, =errno= is thread-local, so it is thread-safe.

Disadvantages of /Function Return Value/:

  • functions that return error indicators cannot use return value for other uses;
  • checking every function call for an error condition increases code stabilities by 30%-40%;
  • impossible for library function to enforce that callers check for error condition.

*** strerror :PROPERTIES: :CUSTOM_ID: language_error_strerror :END:

=char * strerror(int errnum);=

Interprets the value of /errnum/, generating a string with a message that describes the error condition as if set to =errno= by a function of the library. The returned pointer points to a statically allocated string, which shall not be modified by the program. Further calls to this function may overwrite its content (particular library implementations are not required to avoid data races). The error strings produced by strerror may be specific to each system and library implementation.

*** perror :PROPERTIES: :CUSTOM_ID: language_error_perror :END:

=void perror(const char *str);=

Interprets the value of =errno= as an error message, and prints it to stderr (the standard error output stream, usually the console), optionally preceding it with the custom message specified in /str/. If the parameter str is not a null pointer, /str/ is printed followed by a colon =:= and a space. Then, whether /str/ was a null pointer or not, the generated error description is printed followed by a newline character ='\n'=. =perror= should be called right after the error was produced, otherwise it can be overwritten by calls to other functions.

** Function :PROPERTIES: :CUSTOM_ID: language_function :END:

*** main :PROPERTIES: :CUSTOM_ID: language_function_main :END:

C90 =main()= declarations: #+BEGIN_SRC c int main(void);

int main(int argc, char **argv);

/* samed with above */ int main(int argc, char *argv[]);

/* classicaly, Unix system support a third variant */ int main(int argc, char argv, charenvp); #+END_SRC

C99 the value =return= from =main()=:

  • the =int= return type may not be omitted.
  • the =return= statement may be omitted, if so and =main()= finished, there is an implicit =return 0=.

In arguments:

  • ~argc > 0~
  • ~argv[argc] == 0~
  • ~argv[0]~ through to ~argv[argc-1]~ are pointers to string whose meaning will be determined by the program.
  • ~argv[0]~ will be a string containing the program's name or a null string if that is not avaiable.
  • ~envp~ is not specified by POSIX but widely supported, =getenv= is the only one specified by the C standard, the =putenv= and ~extern char **environ~ are POSIX-specific.

*** Forward declaration

  • call graph is cyclic
  • cross more than one translation unit

** Macro :PROPERTIES: :CUSTOM_ID: language_macro :END:

*** =#= macro operator :PROPERTIES: :CUSTOM_ID: language_macro_sharp :END:

Prefixing a macro token with =#= will quote that macro token. This allows you to turn bare words in your source code into text token. This can be particularly useful for writing a macro to convert the member of =enum= from =int= into a string.

#+BEGIN_SRC c enum COLOR { RED, GREEN, BLUE }; #define COLOR_STR(x) #x #+END_SRC

*** =##= macro operator :PROPERTIES: :CUSTOM_ID: language_macro_sharp_sharp :END:

The =##= operator takes two separate tokens and pastes them together to form a single identifier. The resulting identifier could be a variable name, or any other identifier.

#+BEGIN_SRC c #define DEFVAR(type, var, val) type var_##var = val

DEFVAR(int, x, 1); /* expand to: int var_x = 1; / DEFVAR(float, y, 2.718); / expand to: float var_y = 2.718; */ #+END_SRC

*** Expression :PROPERTIES: :CUSTOM_ID: language_macro_expression :END:

Expression-type macro will expand to expression, such as the following macro definition #+BEGIN_SRC c #define double_v1(x) 2*x #+END_SRC

But =double_v1= has drawback, call ~double_v1(1+1)8~ expands to wrong expression ~21+1*8~ .

Use parens to quoted input arguments and final expression: #+BEGIN_SRC c #define double_v2(x) (2*(x)) #+END_SRC

Now, it expands to ~(2*(1+1))*8~

But, =max= macro has side-effect that eval the argument twice #+BEGIN_SRC c #define max(a, b) ((a) > (b) ? (a) : (b)) #+END_SRC when call it with ~max(a, b++)~ .

*** Block :PROPERTIES: :CUSTOM_ID: language_macro_block :END:

If the macro definition includes =;= statatment ending character, we need to block it.

#+BEGIN_SRC c #define incr(a, b)
(a)++;
(b)++; #+END_SRC

Call it with #+BEGIN_SRC c int a=2, b=3; if (a > b) incr(a, b); #+END_SRC

just only =b= will be incremented. We can block it and convert it to block-type macro.

#+BEGIN_SRC c #define incr(a, b) {
(a)++; (b)++;
} #+END_SRC

But the aboved block macro is not good enough: omit =;= is no intitutive and the tailing =;= will wrong in some cases, such as

#+BEGIN_SRC c int a = 2, b = 3; if (a < b) incr(a, b); /* tailing ; */ else a *= 10;

/* expanded code, and should compile failed */ if (a < b) { (a)++; (b)++; }; else a *= 10; #+END_SRC

=do { ... } while (0)= resolved those issues. #+BEGIN_SRC c #define incr(a, b) do {
(a)++; (b)++;
} while (0) /* no tailing ; */

/* expanded code / if (a < b) do { (a)++; (b)++; } while (0); / append ; */ else a *= 10; #+END_SRC

*** Name clash :PROPERTIES: :CUSTOM_ID: language_macro_name_clash :END:

We can use same machinism like Lisp's ~(gensym)~ to rebind the input arguments to new symbols.

*** Nested macro :PROPERTIES: :CUSTOM_ID: language_macro_nested_macro :END:

Macro name within another macro is called Nesting of Macro.

#+BEGIN_SRC c #define SQUARE(x) ((x)(x)) #define CUBE(x) (SQUARE(x)(x)) #+END_SRC

*** Check expansion :PROPERTIES: :CUSTOM_ID: language_macro_check_expansion :END:

#+BEGIN_SRC c cc -E #+END_SRC

** Pointer :PROPERTIES: :CUSTOM_ID: language_pointer :END:

*** =&= and =*= :PROPERTIES: :CUSTOM_ID: language_pointer_address_of_and_dereference :END:

The =&= address of.

The == has two distinct meanings within C in relation to pointers, depending on where it's used. When used within a /variable declaration/, the value on the right hand side of the equals side should be a /pointer value/ to an address in memory. When used with an already /declared variable/, the == will deference the pointer value, following it to the pointer-to place in memory, and allowing the value stored there to be assigned or retrieved.

*** =sizeof= Pointer :PROPERTIES: :CUSTOM_ID: language_pointer_sizeof_pointer :END:

Depends on compiler and machine, all types of pointers on specified machine and compiled via specified compiler has same the size, generally occupy one machine word.

*** =const= Pointer :PROPERTIES: :CUSTOM_ID: language_pointer_const_pointer :END:

Threre is a technique known as the [[http://c-faq.com/decl/spiral.anderson.html][Clockwise/Spiral Rule]] enables any C programmer to parse in their head any C declaration.

The first =const= can be either side of the type. #+BEGIN_SRC c const int * == int const ; / pointer to const int / const int * const == int const * const; / const pointer to const int */

#+END_SRC

  • pointer to =const= object #+BEGIN_SRC c int v = 0x11223344; const int *p = &v; #+END_SRC

  • =const= pointer to object #+BEGIN_SRC c int v1=0x11223344; int *const p1 = &v1; #+END_SRC

  • =const= pointer to =const= object #+BEGIN_SRC c int v1=0x11223344; const int *const p = &v1; #+END_SRC

  • pointer to pointer to =const= object #+BEGIN_SRC c const int **p; #+END_SRC

  • pointer to =const= pointer to object #+BEGIN_SRC c int *const *p; #+END_SRC

  • =const= pointer to pointer to object #+BEGIN_SRC c int* *const p; #+END_SRC

  • pointer to =const= pointer to =const= object #+BEGIN_SRC c const int *const *p; #+END_SRC

  • =const= pointer to pointer to =const= object #+BEGIN_SRC c const int **const p; #+END_SRC

  • =const= pointer to =const= pointer to object #+BEGIN_SRC c int *const *const p; #+END_SRC

Run example: #+BEGIN_SRC sh ./configure --has-lang make clean lang_ptr_const_test #+END_SRC

*** =volatile= Pointer :PROPERTIES: :CUSTOM_ID: language_pointer_volatile_pointer :END:

The =volatile= is to tell the compiler not to optimize the reference, so that every read or write does not use the value stored in register but does a real memory access.

#+BEGIN_SRC c volatile int v1; int p_v1 = &v1; / bad */ volatile int p_v1 = &v1; / better */ #+END_SRC

*** =restrict= Pointer :PROPERTIES: :CUSTOM_ID: language_pointer_restrict_pointer :END:

  • =restrict= keyword had been introduced after c99
  • It's only way for programmer to inform about an optimizations that compiler can make.

*** function Pointer :PROPERTIES: :CUSTOM_ID: language_pointer_function_pointer :END:

#+BEGIN_SRC c return_type_of_fn (*fn)(type_of_arg1 arg1, type_of_arg2 arg2 ...); #+END_SRC

  • =void= Pointer The =void*= is a catch all type for pointers to object types, via ~void~ pointer can get some ploymorphic behavior. see =qsort= in =stdlib.h=

*** Dangling Pointer

Pointers that point to invalid addresses are sometimes called dangling pointers.

*** Pointer decay :PROPERTIES: :CUSTOM_ID: language_pointer_pointer_decay :END:

Decay refers to the implicit conversion of an expression from an array type to a pointer type. In most contexts, when the compiler sees an array expression it converts the type of the expression from /N-element array of T/ to /const pointer to T/ and set the value of the expression to the address of the first element of the array. The exceptions to this rule are when an array is an operand of either the =sizeof= or =&= operators, or the array is a string literal being used as an initializer in a declaration. More importantly the term decay signifies loss of type and dimension.

*** Pointer aliasing :PROPERTIES: :CUSTOM_ID: pointer-aliasing :END:

In computer programming, aliasing refers to the situation where the same memory location can be accessed using different names.

** Storage :PROPERTIES: :CUSTOM_ID: language_storage :END:

/Storage class/ in C decides the part of storage to be allocated for a variable, it also determines the scope of a variable. Memory and CPU registers are types of locations where a variable's value can be stored. There are four storage classes in C those are /automatic/, /register/, /static/, and /external/.

Each [[#language_scope_declaration_and_definition][declaration]] can only have one of five /storage class specifier/: =static=, =extern=, =auto=, =register= and =typedef=.

=typedef= storage class specifier does not reserve storage and is called a storage class specifier only for syntatic convenience.

The general declaration that use a /storage class/ is show here: ~ ~

Living example: #+BEGIN_SRC sh ./configure --has-lang make clean lang_storage_test #+END_SRC

*** Automatic storage class :PROPERTIES: :CUSTOM_ID: language_storage_automatic_storage_class :END:

=auto= storage class specifier denotes that an identifier has /automatic duration/. This means once the scope in which the identifier was defined ends, the object denoted by the identifier is no longer valid.

Since all objects, not living in global scope or being declared =static=, have /automatic duration/ by default when defined, this keyword is mostly of historical interest and should not be used. =auto= can't apply to parameter declarations. It is the default for variable declared inside a function body, and is in fact a historic leftover from C predecessor's B.

  • /scope/: variable defined with =auto= storage class specifier are local to the [[#language_scope_function_scope][function scope]] or [[#block_scope][block scope]] inside which they are defined.
  • /duration/: [[#language_duration][automatic]], till the end of the [[#language_scope_function_scope][function scope]] or [[#block_scope][block scope]] where the variable is defined
  • /default initial value/: garbage value

*** Register storage class :PROPERTIES: :CUSTOM_ID: language_storage_register_storage_class :END:

Hints to the compiler that access to an object should as fast as possible.Whether the compiler actually uses the hint is implementation-defined; it may simply treat it as equivalent to =auto=.

The compiler does make sure that you do not take the address of a vairable with the register storage class.

The only property that is definitively different for all objects that are declared with =register= is that they cannot have their address computed. Thereby =register= can be a good tool to ensure centain optimizations:

#+BEGIN_SRC c /* error: address of register variable requested */ register int i = 0x10; int *p = &i; #+END_SRC

=i= that can never alias because no code can pass its address to another function where it might be changed unexpectedly

This property also implies that an array #+BEGIN_SRC c void decay(char *a); register char a[] = { 0x11, 0x22, 0x33, 0x44, }; decay(a); #+END_SRC

cannot decay into a pointer to its first element (i.e. turning into =&a[0]=). This means that the elements of such an array cannot be accessed and the array itself cannot be passed to a function.

In fact, the only legal usage of an array declared with a =register= storage class is the =sizeof= operator; Any other operator would require the address of the first element of the array. For that reason, arrays generally should not be declared with the =register= keyword since it makes them useless for anything other than size computation of the entire array, which can be done just as easily without =register= keyword.

The =register= storage class is more appropriate for variables that are defined inside a block and are accessed with high frequency.

  • /scope/: [[#language_scope_function_scope][function scope]] or [[#block_scope][block scope]]
  • /duration/: [[#language_duration][automatic]], till the end of [[#language_scope_function_scope][function scope]] or [[#block_scope][block scope]] in which the variable is defined
  • /default initial value/: garbage value

*** Static storage class :PROPERTIES: :CUSTOM_ID: language_storage_static_storage_class :END:

The /static storge class/ serves different purposes, depending on the location of the declaration in the file. =>=C99=, used in function parameters to denote an array is expected to have a constant minimum number of elements and a non-null parameter.

  • /scope/: [[#language_scope_file_scope][file scope]] (confine the identifier to that /translation unit/ only) or [[#language_scope_function_scope][function scope]] (save data for use with the next call of a function)
  • /duration/: [[#language_duration][static]]
  • /default initial value/: 0

*** External storage class :PROPERTIES: :CUSTOM_ID: language_storage_external_storage_class :END:

=extern= keyword used to declare an object or function that is defined elsewhere (and that has [[#language_linkage_external_linkage][external linkage]]). In general, it is used to declare an object or function to be used in a module that is not the one in which the corresponding object or function is defined.

  • /scope/: global
  • /duration/: [[#language_duration][static]]
  • /default initial value/: 0

** Scope :PROPERTIES: :CUSTOM_ID: language_scope :END:

In C, all identifiers are lexically (or statically) scoped.

The scope of a [[#language_scope_declaration_and_definition][declaration]] is the part of the code where the declaration is seen and can be used. Note that this says nothing about whether the object associated to the declaration can be accessed from some other part of the code via another declaration. We uniquely identify an object by its memory: the storage for a variable or the function code.

Finally, note that a [[#language_scope_declaration_and_definition][declaration]] in a nested scope can hide a declaration in an outer scope; but only if one of two has [[#language_linkage_no_linkage][no linkage]].

*** Declarations and Definitions :PROPERTIES: :CUSTOM_ID: language_scope_declaration_and_definition :END:

If neither the =extern= keyword nor an initializer are present, the statement can be either a /declaration/ or a /definition/. It is up to the compiler to analyse the modules of the program and decide.

  • All /declarations/ with [[#language_linkage_no_linkage][no linkage]] are also /definitions/. Other /declarations/ are /definitions/ if they have an initializer.

  • A [[#language_scope_file_scope][file scope]] variable /declaration/ without the [[#language_linkage_external_linkage][external linkage]] storage class specifier or initializer is a tentative /definition/.

  • All /definitions/ are /declarations/ but not vice-versa.

  • A /definition/ of an identifier is a /declaration/ for that identifier that: for an object, causes storage to be reserved for that object.

A /declaration/ specifies the interpretation and attributes of a set of identifiers. A /definition/ of an identifier is a declaration for that identifier that:

  • for an object, causes storage to be reserved for that object;
  • for a function, includes the function body;
  • for an enumeration constant or typedef name, is the only declaration of the identifier.

In the following example we declared a function. Using =extern= keyword is optional while declaring function. If we don't write =exern= keyword while declaring function, it is automatically appended before it. #+BEGIN_SRC c int add(int, int); #+END_SRC

*** Block scope :PROPERTIES: :CUSTOM_ID: block_scope :END:

Every variable or function declaration that appears inside a block has block scope. The scope goes from the declaration to the end of the innermost block in which the declaration appears. Function parameter declarations in function definitions (but not in prototypes) also have block scope. The scope of a parameter declaration therefore includes the parameter declarations that appears after it.

*** Function scope :PROPERTIES: :CUSTOM_ID: language_scope_function_scope :END:

=goto

/function prototype scope/ is the scope for function parameters that appears inside a function prototype. It extends until the end of the prototype. This scope exists to ensure that function parameters have distinct names.

*** File scope :PROPERTIES: :CUSTOM_ID: language_scope_file_scope :END:

All vairables and functions defined ouside functions have /file scope/. They are visible from their [[#language_scope_declaration_and_definition][declaration]] until the end of the file. Here, the term /file/ should be understood as the source file being compiled, after all includes have been resolved.

** Duration :PROPERTIES: :CUSTOM_ID: language_duration :END:

Indicates whether the object associated to the [[#language_scope_declaration_and_definition][declaration]] persists throughout the program's execution (/static/) or whether it is allocated dynamically when the declaration's scope is entered (/automatic/).

There are two kind of duration:

  • automatic
  • static

Within functions at [[#block_scope][block scope]], declarations without =extern= or =static= have automatic duration. Any other declaration at [[#language_scope_file_scope][file scope]] has static duration.

** Linkage :PROPERTIES: :CUSTOM_ID: language_linkage :END:

/Linkage/ describes how identifiers can or can not refer to the same entity throughout the whole program or one single translation unit.

Living example: #+BEGIN_SRC sh ./configure --has-lang make clean lang_linkage_test #+END_SRC

*** Translation unit :PROPERTIES: :CUSTOM_ID: language_linkage_translation_unit :END:

A /translation unit/ is the ultimate input to a C compiler from which an object file is generated. In casual usage it is sometimes referred to as a /compilation unit/. A translation unit roughly consists of a source file after it has been processed by the C preprocessor, meaning that header files listed in =#include= directives are literally included, sections of code within =#ifdef= may be included, and macros have been expanded.

*** No linkage :PROPERTIES: :CUSTOM_ID: language_linkage_no_linkage :END:

A declaration with /no linkage/ is associated to an object that is not shared with any other declaration. All declarations with /no linkage/ happen at [[#block_scope][block scope]]: all block scope declarations without the extern storage class specifier have /no linkage/.

*** Internal linkage :PROPERTIES: :CUSTOM_ID: language_linkage_internal_linkage :END:

/Internal linkage/ means that the variable must be defined in your translation unit scope, which means it should either be defined in any of the included libraries, or in the same file scope. Within the translation unit, all declarations with /internal linkage/ for the same identifier refer to the same object.

*** External linkage :PROPERTIES: :CUSTOM_ID: language_linkage_external_linkage :END:

/External linkage/ means that the variable could be defined somewhere else outside the file you are working on, which means you can define it inside any other translation unit rather your current one. Within the whole program, all declarations with /external linkage/ for the same identifier refer to the same object.

*** Size type and Pointer difference types :PROPERTIES: :CUSTOM_ID: language_type_size_type_and_pointer_difference_type :END:

The C language specification include the /typedefs/ =size_t= and =ptrdiff_t= to represent memory-related quantities. Their size is defined according to the target processor's arithmetic capabilities, not the memory capabilities, such as avaialable address space. Both of these types are defined in the =<stddef.h>= header.

  • =size_t= is an unsigned integeral type used to represent the size of any object in the particular implementation. The =sizeof= operator yields a value of the type =size_t=. The maximum size of =size_t= is provided via =SIZE_MAX=, a macro constant which is defined in the =<stdint.h>= header.

  • =ptrdiff_t= is a signed integral type used to reprensent the difference between pointers. It is only guranteed to be valid against pointers of the same type.

  • =ssize_t= is POSIX standard not C standard.

*** Literal suffix :PROPERTIES: :CUSTOM_ID: language_type_literal_suffix :END:

  • =l= or =L= for =long=, such as =123l=, =3.14L=
  • =f= for =float=, such as =2.718f=

** struct :PROPERTIES: :CUSTOM_ID: language_struct :END:

A =struct= is a type consisting of a sequence of members whose storage is allocated in order which the members were defined.

#+BEGIN_SRC c struct optional_name { declaration_list; }; struct name; #+END_SRC

Initialization, =sizeof= and === operator ignore the flexible array member.

Run example #+BEGIN_SRC c ./configure --has-lang make clean lang_struct_test #+END_SRC

*** Padding :PROPERTIES: :CUSTOM_ID: struct_padding :END:

There may be unnamed padding between any two members of a struct or after the last member, but not before the first member. The size of a struct is at least as large as the sum of the sizes of its members.

#+BEGIN_SRC c extern int a[]; /* the type of a is incomplete / char a[4]; / the type of a is now complete */

struct node { struct node next; / struct node is incomplete type at this point / } / struct node is now complete at this point */ #+END_SRC

** union :PROPERTIES: :CUSTOM_ID: language_union :END:

A union is a type consisting of a sequence of members whose storage overlaps.

#+BEGIN_SRC c union optional_name { declaration_list; }; union name; #+END_SRC

** Type :PROPERTIES: :CUSTOM_ID: language_type :END:

*** Basic types :PROPERTIES: :CUSTOM_ID: basic_types :END:

**** Integer :PROPERTIES: :CUSTOM_ID: basic_types_integer :END:

All C types be represented as binary numbers in memory, the way how to interprete those numbers is what type does.

C provides the four basic /arithmetic type specifiers/ =char=, =int=, =float= and =double=, and the /modifiers/ =signed=, =unsigned=, =short= and =long=.

=long= and =long int= are identical. So are =long long= and =long long int=. In both case, the =int= is optional.

| specifier | type | |-----------------+-----------------| | =long long int= | =long long int= | | =long long= | =long long int= | | =long= | =long int= | | | |

*** Incomplete type :PROPERTIES: :CUSTOM_ID: incomplete_type :END:

An incomplete type is an object type that lacks sufficent information to determine the size of the object of that object, and an incomplete type may be completed at some point in the translation unit.

  • =void= cannot be completed.
  • =[]= array type of unknown size, it can be completed by a later declaration that specifies the size.

** typedef :PROPERTIES: :CUSTOM_ID: language_typedef :END:

#+BEGIN_SRC c typedef type_specifier declarator; typedef type_specifier declarator1, *declarator2, (*declarator3)(void); #+END_SRC

The /typedef/ used to create an alias name for another types. As such, it is often used to simplify the syntax of declaring complex data structure consisting of /struct/ and /union/ types, but is just as common in providing specific descriptive type names for integer types of varying lengths. The C standard library and POSIX reserve the suffix =_t=, for example as in =size_t= and =time_t=.

=#define= is a C directive which is also used to define the aliases for various data types similar to =typedef= but with the following differences:

  • =typedef= is limited to givien symbolic names to types only where as =#define= can be used to define alias for values as well.
  • =typedef= interpretation is performed by the compiler whereas =#define= statements are processed by the preprocessor.

Using =typedef= to hide =struct= is considered a bad idea in [[https://www.kernel.org/doc/html/latest/process/coding-style.html#typedefs][Linux kernel coding style]]

Run =typedef= example #+BEGIN_SRC sh ./configure --has-lang make clean lang_typedef_test #+END_SRC

** typeof

=typeof= operator is not C standard.

Run =typeof= example #+BEGIN_SRC sh ./configure --has-lang make clean lang_typeof_test #+END_SRC

** cdecl :PROPERTIES: :CUSTOM_ID: cdecl :END:

A declaration can have exactly one basic type. The [[#basic_types][basic types]] are argumented with /derived types/, can C has three of them:

  • ~function [(decl-list)] returning~: ()
  • ~array [number] of~: []
  • ~[const | volatile | restrict] pointer to~: ***

The /array of []/ and /function returning ()/ type operators have higher precedence than /pointer to */.

** alloc :PROPERTIES: :custom_id: alloc :END:

*** malloc :PROPERTIES: :custom_id: alloc-malloc :END:

Don't cast the result of malloc. It is unneccessary, as =void *= is automatically and safely prompted to any other pointer type in this case. It adds clutter to the code, casts are not very easy to read (especially if the pointer type is long). It makes you repeat yourself, which is generally bad. It can hide an error, if you forgot to include =<stdlib.h>=. This can crashes (or, worse, not cause a crash until way later in some totally different part of the code). Consider what happens if pointers and integers are differently sized; then you're hiding a warning by casting and might lose bits of your returned address. Note: as of C11 implicit functions are gone from C, and this point is no longer relevant since there's no automatic assumption that undeclared functions return =int=.

To add further, your code needlessly repeats the type information (=int=) which can cause errors. It's better to dereference the pointer being used to store the return value, to lock the two together: =int*x = malloc(length * sizeof *x);= This also moves the =lengh= to theront for increased visibility, and drops the redundant parentheses with =sizeof()=; they are only needed when the argument is a type name. Many people seem to not know or ignore this, which makes their code more verbose. Remember: =sizeof= is not a function!

While moving length to the front may increase visibility in some rare cases, one should also pay attention that in the general case, it should be better to write the expression as: =int x = mallocx * length);= Compare with =malloc(sizeof *x * length * width)= vs. =malloc(length * width * sizeof *x)= the second may overflow the =length * width= when =length= and =width= are smaller types than =size_t=.

*** calloc :PROPERTIES: :custom_id: alloc-calloc :END:

=calloc= should zero intializes the allocated memory. Call =calloc= is not necessarily more expensive.

*** realloc :PROPERTIES: :custom_id: alloc-realloc :END:

** libc :PROPERTIES: :CUSTOM_ID: language_standard_libraries :END:

The /C standard library/ is a standardized collection of header files and library routines used to implement common operations.

** std :PROPERTIES: :CUSTOM_ID: language_std :END:

There has an good answer of [[http://stackoverflow.com/questions/17206568/what-is-the-difference-between-c-c99-ansi-c-and-gnu-c-a-general-confusion-reg][What is the difference between C, C99, ANSI C and GNU C]]:

  • Everything before standardization is generally called "K&R C", after the famous book, with Dennis Ritchie, the inventor of the C language, as one of the authors. This was "the C language" from 1972-1989.
  • The first C standard was released 1989 nationally in USA, by their national standard institute ANSI. This release is called C89 or ANSI-C. From 1989-1990 this was "the C language".
  • The year after, the American standard was accepted internationally and published by ISO (ISO 9899:1990). This release is called C90. Technically, it is the same standard as C89/ANSI-C. Formally, it replaced C89/ANSI-C, making them obsolete. From 1990-1999, C90 was "the C language".
  • Please note that since 1989, ANSI haven't had anything to do with the C language. Programmers still speaking about "ANSI C" generally haven't got a clue about what it means. ISO "owns" the C language, through the standard ISO 9899.
  • In 1999, the C standard was revised, lots of things changed (ISO 9899:1999). This version of the standard is called C99. From 1999-2011, this was "the C language". Most C compilers still follow this version.
  • In 2011, the C standard was again changed (ISO 9899:2011). This version is called C11. It is currently the definition of "the C language".

*** headers

| name | std | intro | |---------------+-----+-----------------------------------------------------------------------| | assert.h | C90 | conditionally compiled macro that compare its argument to zero | | ctype.h | C90 | functions to determine the type contained in character data | | errno.h | C90 | macros reporting error conditions | | float.h | C90 | limits of float types | | limits.h | C90 | sizes of basic types | | locale.h | C90 | localization utilities | | math.h | C90 | common mathematics functions | | setjmp | C90 | nonlocal jumps | | signal.h | C90 | signal handling | | stdarg.h | C90 | variable arguments | | stddef.h | C90 | common macro definitions | | stdio.h | C90 | input/output | | stdlib.h | C90 | general utilities: memory, program, string, random, algorithms | | string.h | C90 | string handling | | time.h | C90 | time/date utilites | |---------------+-----+-----------------------------------------------------------------------| | iso646.h | C95 | alternative operator spellings | | wchar.h | C95 | extended multibyte and wide character | | wctype.h | C95 | functions to determine the type contained in wide character utilities | |---------------+-----+-----------------------------------------------------------------------| | complex.h | C99 | complex number arithmetic | | fenv.h | C99 | floating-point environment | | inttypes.h | C99 | format conversion of integer types | | stdbool.h | C99 | macros for boolean types | | stdint.h | C99 | Fixed-width integer types | | tgmath.h | C99 | type-generic math | |---------------+-----+-----------------------------------------------------------------------| | stdalign.h | C11 | alignas and alignof convenience macros | | stdatomic.h | C11 | atomic types | | stdnoreturn.h | C11 | noreturn convenience macros | | threads.h | C11 | thread library | | uchar.h | C11 | UTF-16/32 character utilities | | | | |

** References :PROPERTIES: :CUSTOM_ID: language_references :END:

  • [[https://en.cppreference.com/w/c/language/history][History of C]]
  • [[https://en.cppreference.com/w/c/language/basic_concepts][Basic concepts]]
  • [[https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html][Preprocessor Output]]
  • [[http://c-faq.com/decl/spiral.anderson.html][Clockwise/Spiral Rule]]
  • [[http://norswap.com/c_scope_duration_linkage/][C: Scope, Duration & Linkage]]
  • [[http://stackoverflow.com/documentation/c/1108/pointers#t=201702060822544818513][Pointers]]
  • [[http://stackoverflow.com/questions/1461432/what-is-array-decaying][What is array decaying?]]
  • [[http://stackoverflow.com/questions/2524611/how-can-one-print-a-size-t-variable-portably-using-the-printf-family][printf size_t]]
  • [[http://unixwiz.net/techtips/reading-cdecl.html][Steve Friedl's Unixwiz.net Tech Tips: Reading C type declarations]]
  • [[https://cdecl.org/][cdecl]]
  • [[https://en.wikibooks.org/wiki/C_Programming/Standard_libraries][wikibooks: C Programming/Standard libraries]]
  • [[https://en.wikipedia.org/wiki/C11_(C_standard_revision)][wikipedia: C11 (C standard revision)]]
  • [[https://en.wikipedia.org/wiki/C99][wikipedia: C99]]
  • [[https://en.wikipedia.org/wiki/C_data_types][wikipedia: C data types]]
  • [[https://en.wikipedia.org/wiki/Linkage_(software)][wikipedia: Linkage]]
  • [[https://en.wikipedia.org/wiki/Maximal_munch][wikipedia: Maximal munch]]
  • [[https://en.wikipedia.org/wiki/Pointer_aliasing][wikipedia: Pointer aliasing]]
  • [[https://en.wikipedia.org/wiki/Translation_unit_(programming)][wikipedia: Translation unit]]
  • [[https://en.wikipedia.org/wiki/Typedef][wikipedia: typedef]]
  • [[https://github.com/nodejs/http-parser][http parser]]
  • [[https://ptolemy.eecs.berkeley.edu/~johnr/tutorials/assertions.html][How to use assertions in C]]
  • [[https://resources.sei.cmu.edu/asset_files/Presentation/2016_017_101_484207.pdf][Beyond errno Error Handling in C]]
  • [[https://stackoverflow.com/questions/204476/what-should-main-return-in-c-and-c][What should main() return in C and C++?]]
  • [[https://stackoverflow.com/questions/252780/why-should-we-typedef-a-struct-so-often-in-c][Why should we typedef a struct so often in C?]]
  • [[https://stackoverflow.com/questions/3323445/what-is-the-difference-between-asm-asm-and-asm][What is the difference between 'asm', '__asm' and 'asm'?]]
  • [[https://www.bell-labs.com/usr/dmr/www/chist.html][The Development of the C Lanuage]]
  • [[https://www.kernel.org/doc/html/latest/process/coding-style.html][Linux kernel coding style]]
  • [[https://www.cs.rit.edu/~kar/pointers.on.c/index.html][Kenneth A.Reek: Pointers on C]]
  • Compiler ** flex

*** References

  • [[https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/handouts/050%20Flex%20In%20A%20Nutshell.pdf][flex In A Nutshell]]
  • x86 :PROPERTIES: :CUSTOM_ID: x86 :END:

While memory stores the program and data, the /Central Processing Unit/ does all the work. The CPU has two parts: /registers/ and /Arithmetic Logic Unit(ALU)/. The ALU performs the actual computations such as addtion and multiplication along with comparison and other logical operations.

** Load :PROPERTIES: :CUSTOM_ID: load :END:

/Load/ instructions read bytes into register. The source may be a constant value, another register, or a location in memory.

#+BEGIN_SRC asm ;; load the constant 23 into register 4 R4 = 23

;; copy the contents of register 2 into register 3 R3 = R2

;; load char (one byte) starting at memory address 244 into register 6 R6 = .1 M[244]

;; load R5 with the word whose memory address is in R1 R5 = M[R1]

;; load the word that begins 8 bytes after the address in R1. ;; this is known as constant offset mode and is about the fanciest ;; addressing mode a RISC processor will support R4 = M[R1+8] #+END_SRC

** Store :PROPERTIES: :CUSTOM_ID: store :END:

/Store/ instructions are basically the reverse of /load/ instructions: they move values from registers back out to memory.

#+BEGIN_SRC asm ;; store the constant number 37 into the word beginning at 400 M[400] = 37

;; store the value in R6 into the word whose address is in R1 M[R1] = R6

;; store lower half-word from R2 into 2 bytes starting at address 1024 M[1024] = .2 R2

;; store R7 into the word whose address is 12 more than the address in R1 M[R1+12] = R7 #+END_SRC

** ALU :PROPERTIES: :CUSTOM_ID: ALU :END:

#+BEGIN_SRC asm ;; add 6 to R3 and store the result in R1 R1 = 6 + R3

;; subtract R3 from R2 and store the result in R1 R1 = R2 - R3 #+END_SRC

** Branching :PROPERTIES: :CUSTOM_ID: branching :END:

By default, the /CPU/ fetches and executes instructions from memory in order, working from low memory to high. Branch instructions alter this order. Branch instructions test a condition and possibly change which instruction should be executed next by changing the value of the /PC/ register. The operands in the test of a branch statement must be in registers or constant values. Branches are used to implement control structures like =if= as well as loops like =for= and =while=.

#+BEGIN_SRC asm ;; begin executing at address 344 if R1 equals 0 BEQ R1, 0, 344

;; begin executing at address 8 past current instruction if R2 less than R3 BLT R2, R3, PC+8

;; The full set of branch variants: BLT ... ;; branch if first argument is less than second BLE ... ;; less than or equal BGT ... ;; greater than BGE ... ;; greater than or equal BEQ ... ;; equal BNE ... ;; not equal

;; unconditional jump that has no test, but just immediately ;; diverts execution to new address ;; begin executing at address 2000 unconditionally: like a goto JMP 2000

;; begin executing at address 12 before current instruction JMP PC-12 #+END_SRC

** Type Convertion :PROPERTIES: :CUSTOM_ID: type_convertion :END:

The types =char=, =short=, =int=, and =long= are all in the same family, and use the same binary polynomial representation. C allows you to freely assign between these types.

  • broaden: When assigning from a smaller-sized type to a larger, there is no problem. All of the source bytes are copied and the remaining upper bytes in the destination are filled using what is called /sign extension/ -- the sign bit is extended across the extra bytes.
  • narrow: Only copy the lower bytes and ignores the upper bytes.

Remember a floating point 1.0 has a completely different arrangement of bits than the integer 1 and instruction are required to do those conversions.

#+BEGIN_SRC asm ;; take bits in R2 that represent integer, convert to float, store in R1 R1 = ItoF R2

;; take bits in R4, convert from float to int, and store back in same Note ;; that converting in this direction loses information, the fractional ;; component is truncated and lost R4 = FtoI R3 #+END_SRC

** Typecast :PROPERTIES: :CUSTOM_ID: typecast :END:

A /typecast/ is a compile-time entity that instructs the compiler to treat an expression differently than its declared type when generating code for that expression.

  • casting a /pointer/ from one type to another could change the offset was multiplied for pointer arithmetic or how many bytes were copied on a pointer dereference.
  • some typecasts are actually [[#type_convertion][type conversions]]. A type conversion is required when the data needs to be converted from one representation to another, such as when changing an integer to floating point representation or vice versa.
  • most often, a cast does affect the generated code, since the compiler will be treating the expression as a different type.

#+BEGIN_SRC c int i; ((struct binky *)i)->b = 'A'; #+END_SRC

What does this code actually do at runtime? Why would your ever want to do such a thing? The typecast is one of the reasons C is a fundamentatlly unsafe launguage.

** Data Sizes :PROPERTIES: :CUSTOM_ID: data_sizes :END:

| 16-bits | Size (bytes) | Size (bits) | |------------+--------------+-------------| | Word | 2 | 16 | | Doubleword | 4 | 32 | | Quadword | 8 | 64 | | Paragraph | 16 | 128 | | Kilobyte | 1024 | 8192 | | Megabyte | 1,048,576 | 8388608 |

In computing, a /word/ is the natural unit of data used by a particular processor design. A /word/ is a fixed-sized piece of data handled as a unit by the instruction set or the hardware of the processor. The number of bits in a word is an important characteristic of any specific processor design or computer architecture.

** Registers

*** rsp

*** rbp

*** callq

#+BEGIN_SRC asm pushq #+END_SRC

*** retq

#+BEGIN_SRC asm jmp <address-of-$rsp> #+END_SRC

*** cmp

=cmp dst src= perfomans a substraction but does not store result. Such as =sub dst src=.

| cmp dst, src | CF | PF | AF | ZF | SF | OF | |------------------------------------------+----+----+----+----+----+----| | unsigned src < unsigned dst | 1 | | | | | | | parity of LSB is even | | 1 | | | | | | carry in the low nibble of (src-dst) | | | 1 | | | | | 0, (i.e src == dst) | | | | 1 | | | | if MSB of (src-dst) == 1 | | | | | 1 | | | sign bit of src != sign bit of (src-dst) | | | | | | 1 |

*** jmp

| Jump | Description | signed-ness | Flags | |------+--------------------------+-------------+--------------------| | je | jump if equal | | ZF = 1 | | jg | jump if greater | signed | ZF = 0 and SF = OF | | jge | jump if greater or equal | signed | SF = OF | | jl | jump if less | signed | SF != OF | | jle | jump if less or equal | signed | ZF = 1 or SF != OF |

*** rflags

RFLAGS Register | Bit(s) | Label | Description | |--------+-------+-------------------------------------------------| | 0 | CF | Carry Flag | | 1 | 1 | Reserved | | 2 | PF | Parity Flag, set if LSB contains 1 is even bits | | 3 | 0 | Reserved | | 4 | AF | Auxiliary Carry Flag | | 5 | 0 | Reserved | | 6 | ZF | Zero Flag, set if result is zero | | 7 | SF | Sign Flag, set MSB of result | | 8 | TF | Trap Flag | | 9 | IF | Interrupt Enable Flag | | 10 | DF | Direction Flag | | 11 | OF | Overflow Flag | | 12-13 | IOPL | I/O Privilege Level | | 14 | NT | Nested Task | | 15 | 0 | Reserved | | 16 | RF | Resume Flag | | 17 | VM | Virtual-8086 Mode | | 18 | AC | Alignment Check / Access Control | | 19 | VIF | Virtual Interrupt Flag | | 20 | VIP | Virtual Interrupt Pending | | 21 | ID | ID Flag | | 22-63 | 0 | Reserved |

** Addressing ** References :PROPERTIES: :CUSTOM_ID: x86_references :END:

  • [[https://wiki.osdev.org][OS Dev]]
  • [[http://asm.sourceforge.net/articles/linasm.html][Using Assembly Language in Linux]]
  • [[https://www.cs.yale.edu/flint/cs421/papers/x86-asm/asm.html][Yale: x86 Assembly Guide]]
  • [[https://www.cs.virginia.edu/~evans/cs216/guides/x86.html][Virginia: x86 Assembly Guide]]
  • [[https://wiki.osdev.org/CPU_Registers_x86-64][CPU Registers x86-64]]
  • [[https://software.intel.com/content/www/us/en/develop/articles/introduction-to-x64-assembly.html][Introduction to x64 Assembly]]
  • [[https://www.amd.com/system/files/TechDocs/24594.pdf][AMD64 Architecutre Programmer's Manual]]
  • [[https://www.csee.umbc.edu/~chang/cs313.s02/stack.shtml][C Function Call Conventions and the Stack]]
  • [[https://cs61.seas.harvard.edu/site/2019/Asm/][Harvard CS 61-2019]]
  • [[https://wiki.osdev.org/Calling_Conventions#:~:text=The%20System%20V%20ABI%20is%20one%20of%20the,C%2C%20it%20must%20have%20a%20correct%20C%20prototype.][System V ABI Calling Conventions]]
  • Memory :PROPERTIES: :CUSTOM_ID: memory :END:

Run the examples under =src/memory=. #+BEGIN_SRC sh ./configure --has-memory make clean test #+END_SRC

** Bits and Bytes :PROPERTIES: :CUSTOM_ID: memory-bits-and-bytes :END:

*** Bits :PROPERTIES: :CUSTOM_ID: memory-bits-and-bytes-bits :END:

The smallest unit of memory is the /bit/. A bit can be in one of two states: =on= vs. =off=, or alternately, =1= vs. =0=.

Most computers don't work with bits individually, but instead group eight bits together to form a /byte/. Eash byte maintains one eight-bit pattern. A group of N bits can be arranged in 2^N different patterns.

Strictly speaking, a program can interpret a bit pattern any way it chooses.

*** Bytes :PROPERTIES: :CUSTOM_ID: memory-bits-and-bytes-bytes :END:

The byte is sometimes defined as the /smallest addressable unit/ of memory. Most computers also support reading and writting larger units of memory: 2 bytes /half-words/ (sometimes known as a /short/ word) and 4 byte /word/.

Most computers restrict half-word and word accesses to be /aligned/: a half-word must start at an even address and a word must start at an address that is a multiple of 4.

*** Shift :PROPERTIES: :CUSTOM_ID: memory-bits-and-bytes-shift :END:

Logical shift always fill discarded bits with 0s while arithmetic shift fills it with 0s only for left shift, but for right shift it copies the Most Significant Bit thereby preserving the sign of the operand.

Left shift on unsigned integers, =x << y=

  • shift bit-vector =x= by =y= positions
  • throw away extra bits on left
  • fill with 0s on right

Right shift on unsigned integers, =x >> y=

  • shift bit-vector =x= right by =y= positions
  • throw away extra bits on right
  • fill with 0s on left

Left shift, =x << y=

  • equivalent to multiplying by 2^y
  • if resulting value fits, no 1s are lost

Right shift, =x >> y=

  • logical shift for unsigned values, fill with 0s on left
  • arithmetic shift for signed values
    • replicate most significant bit on left
    • maintains sign of =x=
  • equivalent to =floor(2^y)=
    • correct rounding towards 0 requires some care with signed numbers.
    • =(unsigned)x >> y | ~(~0u >> y)=

** Basic Types :PROPERTIES: :CUSTOM_ID: memory-basic-types :END:

*** Character :PROPERTIES: :CUSTOM_ID: memory-basic-types-character :END:

The ASCII code defines 128 characters and a mapping of those characters onto the numbers 0..127. The letter 'A' is assigned 65 in the ASCII table. Expressed in binary, that's 2^6 + 2^0 (64 + 1). All standard ASCII characters have zero in the uppermost bit (the most significant bit) since they only span the range 0..127.

*** Short Integer :PROPERTIES: :CUSTOM_ID: memory-basic-types-short-integer :END:

2 bytes or 16 bits. 16 bits provide 2^16 = 65536 patterns. This number is known as /64k/, where /1k/ of something is 2^10 = 1024. For non-negative numbers these patterns map to the numbers 0..65535. Systems that are /big-endian/ store the most-significant byte at the lower address. A /litter-endian/ (Intel x86) system arranges the bytes in the opposite order. This means when exchanging data through files or over a network between different endian machines, there is often a substantial amount of /byte-swapping/ required to rearrange the data.

*** Long Integer :PROPERTIES: :CUSTOM_ID: memory-basic-types-long-integer :END:

4 bytes or 32 bits. 32 bits provide 2^32 = 4294967296 patterns. 4 bytes is the contemporary default size for an integer. Also known as a /word/.

*** Fixed-point :PROPERTIES: :CUSTOM_ID: memory-basic-types-fixed-point :END:

*** Floating-point :PROPERTIES: :CUSTOM_ID: memory-basic-types-floating-point :END:

4, 8, or 16 bytes. Almost all computers use the standard IEEE-754 representation for floating point numbers that is a system much more complex than the scheme for integers. The important thing to note is that the bit pattern for the floating point number 1.0 is not the same as the pattern for integer 1. IEEE floats are in a form of scientific notation. A 4-byte float uses 23 bits for the mantissa, 8 bits for the exponent, and 1 bit for the sign. Some processors have a special hardware Floating Point Unit, FPU, that substantially speeds up floating point operations. With separate integer and floating point processing units, it is often possible that an integer and a floating point computation can proceed in parallel to an extent. The exponent field contains 127 plus the true exponent for sigle-precision, or 1023 plus the true exponent for double precision. The first bit of the mantissa is typically assumed to be 1.f, where f is the field of fraction bits.

| | sign | exponent | mantissa | | | | (base 2 + 127) | (base 2, 1/2, 1/4...) | | | | (base 2 + 1023) | | |------------------+--------+-----------------+-----------------------| | signle precision | 1 [31] | 8 [30-23] | 23 [22-00] | | double precision | 1 [63] | 11 [62-52] | 52 [51-00] |

**** References

  • [[https://codedocs.org/what-is/ieee-754][CodeDocs.org: IEEE754]]
  • [[http://mathcenter.oxford.emory.edu/site/cs170/ieee754/][Emory University: The IEEE 754 Format]]

*** Record :PROPERTIES: :custom_id: memory-basic-types-record :END:

The size of a record is equal to at least the sum of the size of its component fields. The record is laid out by allocating the components sequentially in a contiguous block, working from low memory to high. Sometimes a compiler will add invisible pad fields in a record to comply with processor alignment rectrictions.

*** Array :PROPERTIES: :custom_id: memory-basic-types-array :END:

The size of an array is at least equal to the size of each element multiplied by the number of components. The elements in the array are laid out consecutively starting with the first element and working from low memory to high. Given the base address of the array, the compiler can generate constant-time code to figure the address of any element. As with records, there may be pad bytes added to the size of each element to comply with alignment retrictions.

*** Pointer :PROPERTIES: :custom_id: memory-basic-types-pointer :END:

A pointer is an address. The size of the pointer depends on the range of addresses on the machine. Currently almost all machines use 4 bytes to store an address, creating a 4GB addressable range. There is actually very little distinction between a pointer and a 4 byte unsigned integer. They both just store integers-- the difference is in whether the number is /interpreted/ as a number or as an address.

*** Instruction :PROPERTIES: :custom_id: memory-basic-types-instruction :END:

Machine instructions themselves are also encoded using bit patterns, most often using the same 4-byte native word size. The different bits in the instruction encoding indicate things such as what type of instruction it is (load, store, multiply, etc) and registers involved.

** Pointer Basics :PROPERTIES: :custom_id: memory-pointer-basics :END:

*** Pointers and Pointees :PROPERTIES: :custom_id: memory-pointer-basics-pointers-and-pointees :END:

We use the term pointee for the thing that the pointer points to, and we stick to the basic properties of the pointer/pointee relationship which are true in all languages.

Allocating a pointer and allocating a pointee for it to point to are two separate steps. You can think of the pointer/pointee structure are operating at two levles. Both the levels must be setup for things to work.

*** Dereferencing :PROPERTIES: :custom_id: memory-pointer-basics-dereferencing :END:

The dereference operation starts at the pointer and follows its arrow over to access its pointee. The goal may be to look at the pointee state or to change the state.

The dereference operation on a pointer only works if the pointer has a pointee: the pointee must be allocated and the pointer must be set to point to it.

*** Pointer Assignment :PROPERTIES: :custom_id: memory-pointer-basics-pointer-assignment :END:

/Pointer assignment/ between two pointers makes them point to the same pointee. Pointer assignment does not touch the pointees. It just changes one pointer to have the same refrence as another pointer. After pointer assignment, the two pointers are said to be /sharing/ the pointee.

** C Array :PROPERTIES: :custom_id: memory-c-array :END:

A C array is formed by laying out all the elements contiguously in memory from low to high. The array as a whole is referred to by the address of the first element.

The programmer can refer to elements in the array with the simple =[]= syntax such as =intArray[1]=. This scheme works by combing the base address of the array with the simple arithmetic. Each element takes up a fixed number of bytes known at compile-time. So address of the nth element in the array (0-based indexing) will be at an offset of =(n * element_size)= bytes from the base address of the whole array.

*** [] Operator :PROPERTIES: :custom_id: memory-c-array-[]-operator :END:

The square bracket syntax =[]= deals with this address arithmetic for you, but it's useful to know what it's doing. The =[]= multiplies the integer index by the element size, adds the resulting offset to the array base address, and finally deferences the resulting pointer to get to the desired element.

#+BEGIN_SRC c a[3] == *(a + 3); a+3 == &a[3];

a[b] == b[a]; #+END_SRC

The C standard defines the =[]= operator as follows: =a[b] => *(a+b)=, and =b[a] => *(b+a) => *(a+b)=, so =a[b] == b[a]=.

In a closely related piece of syntax, adding an integer to a pointer does the same offset computation, but leaves the result as a pointer. The square bracket syntax dereferences that pointer to access the /nth/ element while the =+= syntax just computes the pointer to the /nth/ element.

Any =[]= expression can be written with the =+= syntax instead. We just need to add in the pointer dereference. For most purposes, it's easiest and most readable to use the =[]= syntax. Every once in a while the =+= is convenient if you needed a pointer to the element instread of the element itself.

*** Pointer++ :PROPERTIES: :custom_id: memory-c-array-pointer++ :END:

If =p= is a pointer to an element in an array, then =(p+1)= points to the next element in the array. Code can exploit this using the construct =p++= to step a pointer over the elements in an array. It doesn't help readability any.

*** Pointer Type Effects :PROPERTIES: :custom_id: memory-c-array-pointer-type-effects :END:

Both =[]= and =++= implicitly use the compile time type of the pointer to compute the element size which effects the offset arithmetic.

#+BEGIN_SRC c int p; p = p + 12; / p + (12 * sizeof(int)) */

p = (int*) ((char*)p + 12); /* add 12 sizeof(char) */

#+END_SRC

Each =int= takes 4 bytes, so at runtime the code will effectively increment the address in =p= by 48. The compiler figures all this out based on the type of the pointer.

*** Arithmetic on a void pointer :PROPERTIES: :custom_id: memory-c-array-arithmetic-on-a-void-pointer :END:

What is =sizeof(void)=? Unknown! Some compilers assume that it should be treat it like a =(char*)=, but if you were to depend on this you would be creating non-portable code.

Note that you do not need to cast the result back to =(void*)=, a =(void*)= is the /universal recipient/ of pinter type and can be freely assigned any type of pointer.

*** Arrays and Pointers :PROPERTIES: :custom_id: memory-c-array-arrays-and-pointers :END:

One effect of the C array scheme is that the compiler does not meaningfully distinguish between arrays and pointers.

*** Array Names are const :PROPERTIES: :custom_id: memory-c-array-array-names-are-const :END:

One subtle distinction between an array and a pointer, is that the pointer which represents the base address of an array cannot be changed in the code. Technically, the array base address is a =const= pointer. The constraint applies to the name of the array where it is declared in the code.

*** Dynamic Arrays :PROPERTIES: :custom_id: memory-c-array-dynamic-arrays :END:

Since arrays are just contiguous areas of bytes, you can allocate your own arrays in the heap using =malloc=. And you can change the size of the =malloc=ed array at will at run time using =realloc=.

*** Passing multidimensional arrays to a function :PROPERTIES: :custom_id: memory-c-array-passing-multidimensional-arrays-to-a-function :END:

*** Iteration :PROPERTIES: :custom_id: memory-c-array-iteration :END:

Row-major order, so load =a[0][0]= would potentially load =a[0][1]=, but load =a[1][0]= would generate a second cache fault.

** Stack Implementation :PROPERTIES: :custom_id: memory-stack-implementation :END:

Writing a generic container in pure C is hard, and it's hard for two reasons:

The language doesn't offer any real support for /encapsulation/ or /information hiding/. That means that the data structures expose information about /internal representation/ right there in the interface file for everyone to see and manipulate. The best we can do is document that the data structure should be treated as an abstract data type, and the client shouldn't directly manage the fields. Instead, he should just rely on the fuctions provided to manage the internals for him.

C doesn't allow data types to be passed as parameters. That means a generic container needs to manually manage memory in terms of the client element size, not client data type. This translates to a bunch of =malloc=, =realloc=, =free=, =memcpy=, and =memmove= calls involving =void*=.

** Endian :PROPERTIES: :custom_id: memory-endian :END:

Endianness refers to the sequential order used to numerically interpret /a range of bytes/ in /computer memory/ as larger, composed word value. It also describes the order of byte transmission over a digital link.

However, if you have a 32-bit register storing a 32-bit value, it makes no to talk about endianness. The righmost bit is the least significant bit, and the leftmost bit is the most significant bit.

*** Big Endian :PROPERTIES: :custom_id: memory-endian-big-endian :END:

#+CAPTION: Big Endian #+NAME: fig:big-endian [[file:src/memory/big-endian.png]]

*** Little Endian :PROPERTIES: :custom_id: memory-endian-little-endian :END:

#+CAPTION: Little Endian #+NAME: fig:little-endian [[file:src/memory/little-endian.png]]

The little-endian system has the property that the same value can be read from memory at different lengths without using different addresses. For example, a 32-bit memory location with content 4A 00 00 00 can be read at the same address as either 8-bit (value = 4A), 16-bit (004A), 24-bit (00004A), or 32-bit (0000004A), all of which retain the same numeric value.

*** Bit Swapping :PROPERTIES: :custom_id: memory-endian-bit-swapping :END:

Some CPU instruction sets provide native support for endian swapping, such as /bswap/ (x86 and later), and /rev/ (ARMv6 and later).

Unicode text can optionally start with a /byte order mark/ (BOM) to signal the endianness of the file or stream. Its code point is U+FEFF. In UTF-32 for example, a big-endian file should start with =00 00 FE FF=; a little endian should start with =FF FE 00 00=.

Endianness doesn't apply to everything. If you do bitwise or bitshift operations on an int you don't notice the endianness.

TCP/IP are defined to be big-endian. The multi-byte integer representation used by the TCP/IP protocols is sometimes called /network byte order/.

In =<arpa/inet.h>=:

  • =htons()= reorder the bytes of a 16-bit unsigned value from processor order to network order, the macro name can be read as "host to network short."
  • =htonl()= reorder the bytes of a 32-bit unsigned value from processor order to network order, the macro name can be read as "host to network long."
  • =ntohs()= reorder the bytes of a 16-bit unsigned value from network order to processor order, the macro name can be read as "network to host short."
  • =ntohl()= reorder the bytes of a 32-bit unsigned value from network order to processor order. The macro name can be read as "network to host long

*** Tools :PROPERTIES: :custom_id: memory-endian-tools :END:

  • =hexdump= on Unix-like system

** Memory Model :PROPERTIES: :custom_id: memory-memory-model :END:

The only thing that C must care about is the type of the object which a pointer addresses. Each pointer type is derived from another type, its base type, and each such derived type is a distinct new type.

** Memory Copy

** References :PROPERTIES: :custom_id: memory-references :END:

  • [[http://cslibrary.stanford.edu/106/][Pointer Basics]]
  • [[http://mjfrazer.org/mjfrazer/bitfields/][How Endianness Effects Bitfield Packing]]
  • [[http://stackoverflow.com/documentation/c/322/arrays#t=20170207121645271737][Arrays]]
  • [[http://steve.hollasch.net/cgindex/coding/ieeefloat.html][IEEE Standard 754 Floating Point Numbers]]
  • [[http://www.catb.org/esr/structure-packing/][The Lost Art of C Structure Packing]]
  • [[https://betterexplained.com/articles/understanding-big-and-little-endian-byte-order/)][Understanding Big and Little Endian Byte Order]]
  • [[https://clang.llvm.org/docs/AddressSanitizer.html][Clang: Address Sanitizer]]
  • [[https://en.wikipedia.org/wiki/Arithmetic_shift][Arithmetic shift]]
  • [[https://en.wikipedia.org/wiki/Endianness][Endianness]]
  • [[https://en.wikipedia.org/wiki/Logical_shift][Logical shift]]
  • [[https://see.stanford.edu/Course/CS107][Programming Paradigms]]
  • [[https://see.stanford.edu/materials/icsppcs107/07-Arrays-The-Full-Story.pdf][The Ins and Outs of C Arrays]]
  • [[https://stackoverflow.com/questions/4306186/structure-padding-and-packing][Structure padding and packing]]
  • [[https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc][Do I cast the result of malloc]]
  • [[https://stackoverflow.com/questions/7622/are-the-shift-operators-arithmetic-or-logical-in-c][Are the shift operators arithmetic or logical in C?]]
  • [[https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html][Big and Little Endian]]
  • [[https://www.embedded.com/optimizing-memcpy-improves-speed/][Optimizing Memcpy improves speed]]
  • [[https://www.ibm.com/developerworks/aix/library/au-endianc/index.html?ca=drs-)][Writing endian-independent code in C]]
  • CPU :PROPERTIES: :CUSTOM_ID: cpu :END:

** cpuid ** Cache

*** Check cache line

  • Linux #+BEGIN_SRC sh ll /sys/devices/system/cpu/cpu0/cache/ cat /sys/devices/system/cpu/cpu0/cache/cherency_line_size #+END_SRC
  • Windows #+BEGIN_SRC cmd wmic cpu list wmic cpu get wmic cpu get L2CacheSize, L2CacheSpeed #+END_SRC

*** References

  • [[https://www.linuxjournal.com/article/7105][Understanding Caching]]
  • [[https://software.intel.com/en-us/articles/efficient-use-of-tiling][Efficient use of Tiling]]

** Timing

#+BEGIN_SRC sh time ls /tmp

...

ls -G /tmp 0.00s user 0.00s system 73% cpu 0.003 total

#+END_SRC

=real= refers to actual elapsed time, =user= and =sys= refer to CPU time used only by the process.

  • =real= is wall clock time.
  • =user= is the amount of CPU time spent in user-mode code within the process.
  • =sys= is the amount of CPU time spent in the kernel within the process.

=user+sys= is the actual all CPU time the process used.

  • POSIX
  • Library :PROPERTIES: :CUSTOM_ID: library :END:

** Static Library ** Shared Library ** Library References :PROPERTIES: :CUSTOM_ID: library_references :END:

  • [[https://en.wikipedia.org/wiki/Dynamic-link_library][Dynamic-link library]]
  • [[https://en.wikipedia.org/wiki/Static_library][Static library]]
  • ELF :PROPERTIES: :CUSTOM_ID: elf :END:

** References

  • [[http://www.skyfree.org/linux/references/ELF_Format.pdf][Executable and Linkable Format (ELF)]]
  • [[https://linux-audit.com/elf-binaries-on-linux-understanding-and-analysis/#:~:text=ELF%2520is%2520the%2520abbreviation%2520for%2520Executable%2520and%2520Linkable,compiler%2520or%2520linker%2520and%2520are%2520a%2520binary%2520format.][The 101 of ELF files on Linux: Understanding and Analysis]]
  • [[https://developer.apple.com/library/archive/documentation/Performance/Conceptual/CodeFootprint/Articles/MachOOverview.html][Apple: Overview of the Mach-O Executable Format]]
  • OS :PROPERTIES: :CUSTOM_ID: os :END:

** References

  • [[https://0xax.gitbooks.io/linux-insides/content/][gitbooks: Linux Inside]]
  • [[http://oliveryang.net/2015/09/pitfalls-of-TSC-usage/][Time stamp counter]]
  • [[https://github.com/mit-pdos/xv6-public.git][xv6-public]]
  • [[https://github.com/rstallman/UNIX-System-V-Release-4-source-code.git][xv4-3b2]]
  • Flex & Bison :PROPERTIES: :CUSTOM_ID: flex_and_bison :END:

#+BEGIN_QUOTE The asteriod to kill this dinosaur is still in orbit. -- Lex Manual Page #+END_QUOTE

** References

  • [[http://dinosaur.compilertools.net][The Lex & Yacc Page]]
  • Unicode :PROPERTIES: :CUSTOM_ID: unicode :END:

** References

  • [[http://www.ibm.com/developerworks/library/l-linuni/][Linux Unicode programming]]
  • [[http://www.joelonsoftware.com/articles/Unicode.html][The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Set]]
  • [[https://en.wikipedia.org/wiki/UTF-8][Wikipedia: UTF-8]]
  • IO :PROPERTIES: :CUSTOM_ID: io :END:

** Stream :PROPERTIES: :CUSTOM_ID: stream :END:

Streams are a portable way of reading and writing data. They provide a flexible and efficient means of I/O.

A Stream is a file or a physical device (e.g. printer or monitor) which is manipulated with a pointer to the stream.

There exists an internal C data structure, =FILE=, which represents all streams and is defined in =stdio.h=.

Stream I/O is /buffered/: That is to say a fixed /chunk/ is read from or written to a file via some temporary storage area (the buffer).

*** Predefined streams :PROPERTIES: :CUSTOM_ID: predefined-streams :END:

There are =stdin=, =stdout=, and =stderr= predefined streams.

*** Redirection :PROPERTIES: :CUSTOM_ID: redirection :END:

  • =>=: redirect =stdout= to a file;
  • =<=: redirect =stdin= from a file to a program;
  • =|=: puts =stdout= from one program to =stdin= of another.

** Buffered vs. Unbuffered :PROPERTIES: :CUSTOM_ID: buffered-vs-unbuffered :END:

All =stdio.h= functions for reading from =FILE= may exhibit either /buffered/ or /unbuffered/ behavior, and either /echoing/ or /non-echoing/ behavior.

The standard library function =setvbuf= can be used to enable or disable buffering of IO by the C library. There are three possible modes: /block buffered/, /line_buffered/, and /unbuffered/.

*** Buffered :PROPERTIES: :CUSTOM_ID: buffered :END:

Buffered output streams will accumulate write result into immediate buffer, sending it to the OS file system only when enough data has accumulated (or =flush()= is requested).

C RTL buffers, OS buffers, Disk buffers.

The function =fflush()= forces a write of all buffered data for the given output or update stream via the stream's underlying write function. The open status of the steam is unaffected.

The function =fpurge()= erases any input or output buffered in the given steam. For output streams this discards any unwritten output. For input streams this discards any input read from the underlying object but not yet obtained via =getc()=; this includes any text pushed back via =ungetc()=

*** Unbuffered :PROPERTIES: :CUSTOM_ID: unbuffered :END:

Unbuffered output has nothing to do with ensuring your data reaches the disk, that functionality is provided by =flush()=, and works on both buffered and unbuffered steams. Unbuffered IO writes don't gurantee the data has reached the physical disk.

=close()= will call =flush()=.

The =open= system call is used for opening an unbuffered file.

** ASCII vs. Binary :PROPERTIES: :CUSTOM_ID: ascii-vs-binary :END:

*** ASCII

Terminals, keyboards, and printers deal with character data. When you want to write a number like =1234= to the screen, it must be converted to four characters ={'1', '2', '3', '4'}= and written. Similarly, when you read a number from the keyboard, the data must be converted from characters to integers. This is done by the =sscanf= routine.

*** Binary

Binary files require no conversion. They also generally take up less space than ASCII files. The drawback is that they cannot be directly printed on a terminal or printer.

** References :PROPERTIES: :CUSTOM_ID: io-references :END:

  • [[https://en.wikipedia.org/wiki/ASCII][ASCII]]
  • [[https://stackoverflow.com/questions/20342772/buffered-and-unbuffered-inputs-in-c][Buffered and Unbuffered inputs in C]]
  • [[https://users.cs.cf.ac.uk/Dave.Marshall/C/node18.html][Input and Output:stdio.h]]
  • [[https://en.wikipedia.org/wiki/Printf_format_string][printf format string]]
  • Network

** DNS :PROPERTIES: :CUSTOM_ID: network-dns :END:

=simple.c= using =getaddrinfo()= API call to query name.

=query.c= using domain name protocol to query name directly without =-lresolv= library.

** TIL

  • =getaddrinfo()= is a POSIX.1g extension and is not available in pure C99, on Linux, so We need =-D_GNU_SOURCE= if =-std=c99= be specified (see [[https://github.com/droe/sslsplit/issues/2][c99 does not define getaddrinfo]]).
  • =socklen_t= represents the size of an address structure, see [[https://yarchive.net/comp/linux/socklen_t.html][Linus Torvalds talk about socklen_t]].

** HTTP ** References :PROPERTIES: :CUSTOM_ID: dns-refrences :END:

  • [[https://www.ietf.org/rfc/rfc1034.txt][RFC 1034: DOMAIN NAMES - CONCEPTS AND FACILITIES]]
  • [[https://www.ietf.org/rfc/rfc1035.txt][RFC 1035: DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION]]
  • [[https://tools.ietf.org/html/rfc1536][RFC 1536: Common DNS Implementation Errors and Suggested Fixes]]
  • [[http://www.linuxhowtos.org/C_C++/socket.htm][Sockets Tutorial]]
  • [[https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html][RFC 26116: HTTP Response]]
  • Parallel

** OpenMP *** References

  • [[https://en.wikipedia.org/wiki/OpenMP][wikipedia]]
  • [[https://www.openmp.org/][Office site]]

** Pthread *** References

  • [[https://computing.llnl.gov/tutorials/pthreads/][POSIX Threads Programming]]
  • Algorithm

** Hash ** Algorithm References

  • [[http://www.cse.yorku.ca/~oz/hash.html][Hash Functions]]
  • Regex :PROPERTIES: :CUSTOM_ID: regex :END:

In POSIX-Extended regular expressions, all characters match themselves except for the following special characters: =.[{}()*+?|^$=

  • WebAssembly

Run example in browser: #+BEGIN_SRC javascript // directly call, shorten version Module._sum(10, 0); // ccall Module.ccall('sum', 'number', ['number', 'number'], [10, 0]); #+END_SRC

  • [[https://developer.mozilla.org/en-US/docs/WebAssembly][MDN: WebAssembly]]
  • [[https://github.com/mdn/webassembly-examples][MDN: webassembly-examples]]
  • [[https://emscripten.org/index.html][emscripten]]
  • Tools :PROPERTIES: :CUSTOM_ID: tools :END:

** Display Dependents of Executable :PROPERTIES: :CUSTOM_ID: display_dependents_of_exeutable :END:

| OS | name | command line | |---------+---------+---------------------------| | MacOS | otool | otool -L | | Linux | objdump | objdump -p | | | ldd | ldd | | Windows | dumpbin | dumpbin -dependents | | | | |

** Read ELF Format :PROPERTIES: :CUSTOM_ID: read-elf-format :END:

/readelf/ displays information about one or more ELF format object files.

This /readelf/ program performs a similar function to /objdump/ but it goes into more detail and it exists independently of the BFD library, so if there is a bug in BFD then /readelf/ will not be affected.

On Darwin, there are no readelf, but we can use /otool/ do the trick.

| OS | name | command line | |---------+--------+----------------| | MacOS | otool | otool -l | | Linux | reaelf | readelf | | Windows | | |

** Metainformation about Libraries

=pkg-config=

** Display Symbol Table :PROPERTIES: :CUSTOM_ID: display-symbol-table :END:

On Unix-like platform, there are /nm/ program can view the symbol table in a executable.

| OS | name | command line | |-------+------+--------------| | MacOS | nm | nm | | | | nm -m | | Linux | nm | nm |

** Remove symbols :PROPERTIES: :CUSTOM_ID: remove-symbols :END:

| OS | name | command line | |-------+-------+--------------| | MacOS | strip | nm | | Linux | strip | nm |

** Disassembly

| OS | name | command line | |-------+---------+------------------| | MacOS | otool | otool -tV | | Linux | objdump | objdump -d | | | | |

** Hex Dump

| OS | name | command line | |--------+-----------+----------------| | MacOS | hexdump | hexdump | | Linux | hexdump | hexdump | | Window | | | | Emacs | hexl-mode | | | | | |

** Trace System Call :PROPERTIES: :CUSTOM_ID: trace-system-call :END:

| OS | name | command line | |-------+--------+-------------------------------| | MacOS | dtruss | dtruss | | Linux | strace | strace -o -C |

** Kernel Trace

  • MacOSX: =ktrace=

** Memory Leak Detection :PROPERTIES: :CUSTOM_ID: memory-leak-detection :END:

*** =valgrind= *** =sanitize=

**** References

  • [[https://clang.llvm.org/docs/AddressSanitizer.html][Clang: AddressSanitizer]]

** Debugger :PROPERTIES: :CUSTOM_ID: debugger :END:

*** Environment

| example | command | |-----------------------------+-------------------------------------------------------| | set working directory | (lldb) platform settings -w | | | (gdb) cd | | | | | list /env/ vars | (lldb) =env= | | | (lldb) =settings show target.env-vars= | | | (gdb) =show env= | | | | | set /env/ var | (lldb) =env XXX=zzz= | | | (lldb) =settings set target.env-vars XXX=aa YYY=bb= | | | (gdb) =set env XXX=zzz= | | | | | unset /env/ var | (lldb) =settings remove target.env-vars XXX= | | | (gdb) =unset env XXX= | | | | | set /argv/ for /main/ entry | (lldb) =r arg1 arg2 arg3= | | | (lldb) =settings set target.run-args arg1 arg2= | | | (gdb) =r arg1 arg2 arg3= | | | (gdb) =set args arg1 arg2= | | | 0:000> =.kill;= =.create arg1 arg2= | | | 0:000> =.exepath+ = |

*** Process

| example | command | |----------------------------+---------------------------------------------------| | run /process/ | (lldb) process launch | | | (gdb) r | | | 0:000> g | | | | | attach /process/ with pid | (lldb) =process attach --pid 123= | | | (gdb) =attach 123= | | | | | attach /process/ with name | (lldb) =process attach --name a.out= | | | (lldb) =attach a.out= | | | | | wait for /process/ | (lldb) =process attach --name a.out --wait-for= | | | (gdb) =attach -waitfor a.out= | | | |

*** Image

| example | command | |-----------------------------------------------+-------------------------------------------------| | list dependents of executable | (lldb) =image list= | | | (gdb) =info sharedlibrary= | | | 0:000> =lm= | | | | | lookup /main/ entry address in the executable | (lldb) =image lookup -a main -v= | | | (gdb) =info symbol main= | | | | | lookup /fn/ or /symbol/ by regexp | (lldb) =image lookup -r -n'[fsv]printf'= | | | | | lookup /type/ | (lldb) =image lookup -t'FILE'= | | | | | add /moudle/ | (lldb) =image add /opt/local/lib/libgeo.dyld= | | | 0:000> =.reload -f -i libcffix.dll= | | | | | unload /module/ | (lldb) == | | | 0:000> =.reload -u libcffix.dll= | | | |

*** Breakpoint

| example | command | |----------------------------------+------------------------------------------------------| | list /breakpoint/ | (lldb) =b= | | | (lldb) =breakpoint list= | | | (gdb) =info break= | | | 0:000> =bl= | | | | | breakpoint at /fn/ | (lldb) =b main= | | | (lldb) =b -nmain= | | | (gdb) =b main= | | | 0:000> =bu !main= | | | | | breakpoint at /line/ | (lldb) =b -ftest.c -l32= | | | (gdb) =b test.c:32= | | | | | breakpoint at /fn/ by regexp | (lldb) =b -rm[a-z]in= | | | | | breakpoint at /source/ by regexp | (lldb) =b -p'm[a-z]in' -ftest.c= | | | | | conditional breakpoint | (lldb) =breakpoint set -fvar.c -l23 -c'2 == argc'= | | | | | delete breakpoint | (lldb) =breakpoint delete 1.1= | | | (lldb) =breakpoint delete 2= | | | 0:000> =bc 1 2= |

*** Memory

| example | command | |-----------------------------------+------------------------------------------------------------| | print /argv in /main/ entry | (lldb) =p -Zargc -- argv= | | | 0:000 == | | | (gdb) =p -- argv[0]@argc= | | | | | examine /argv/ in /main/ entry | (lldb) =x -t'char*' -cargc argv= | | | 0:000> =dp @@(argv)= | | | (gdb) == | | | | | examine array of /char*/ of /argv | (lldb) =x -ssizeof(char*) -cargc -fx argv= | | | | | exmaine /&argc/ in /main/ entry | (lldb) =x -ssizeof(int) -fx -c1 &argc= | | | (gdb) =x/1xw &argc= | | | | | memory read | (lldb) memory read -o/tmp/x.out -s1 -fu -c10 -- &argv[0] | | | |

~*** Frame

| example | command | |-----------------------+---------------------------| | check stack /frame/ | (lldb) =frame info= | | | 0:000> =k= | | list frame /variable/ | (lldb) =frame variable= | | | 0:000> =dv= |

*** Evaluate

| example | command | |---------------------------------+-------------------------------| | evaluate /argc/ in /main/ entry | (lldb) =e -- argc= | | | (lldb) =e -fx -- argc= | | | 0:000> =?? argc= | | | 0:000> =.formats poi(argc)= | | | |

*** Disassemble

| example | command | |----------------------+-----------------------| | disassemble | 0:000> =u= | | disassemble function | 0:000> =uf main= | | | | | disassemble | (lldb) =d= | | disassemble function | (lldb) =d -nmain= | | disassemble favor | (lldb) =d -Fatt= | | | | | disassemble | (gdb) =disassemble= |

*** Step

| example | command | |-----------+--------------| | quit | (lldb) =q= | | | (gdb) =q= | | | 0:000> -q= | | continue | (lldb) =c= | | | 0:000> =g= | | step over | (lldb) =n= | | | 0:000> =p= | | step into | (lldb) =s= | | | (gcc) =s= | | | 0:000> =t= | | | |

*** Thread

| example | command | |--------------+--------------| | list threads | 0:000> =~= | | | |

*** Tools References

  • [[https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2008/e5ewb1h3(v=vs.90)][Microsoft: Memory Leak Detection Enabling]]
  • [[https://lldb.llvm.org/use/map.html][GDB to LLDB command map]]

** CPU Features

  • Linux: #+BEGIN_SRC sh lscpu #+END_SRC
  • Darwin: #+BEGIN_SRC sh sysctl -a | grep machdep.cpu.features #+END_SRC
  • Making the Best Use of C
  • [[https://www.gnu.org/prep/standards/standards.html#Writing-C][GNU Coding Standards]]
  • [[http://nginx.org/en/docs/dev/development_guide.html][Nginx Development guide]]
  • [[https://open-std.org/JTC1/SC22/WG14/][C standard]]
  • [[https://pubs.opengroup.org/onlinepubs/9699919799/functions/contents.html][POSIX standard]]