Git English
创建日期:2014-04-28 23:31

Getting Started

# For a complete list of git subcommands
git help --all

# document for git subcommand
git help <subcommand>
git <subcommand> --help

double dash in command #todo# P19

git show :

# show the details of the most recent commit
git show

# To see more detail about a particular commit
git show <commit number>

git show-branch :

git show-branch --more=10

Basic Git Concepts

NOTE: This chapter is the most important section in this book.

Within a repository, Git maintains two primary data structures, the object store and the index.

At the heart of Git’s repository implementation is the object store. It contains your original data files and all the log messages, author information, dates, and other infor- mation required to rebuild any version or branch of the project.

Object Store

Git places only four types of objects in the object store: blob, tree, commit, and tag.

In the four objects, the tag is optional.

Index

The index is a temporary and dynamic binary file, captures a version of the project’s overall structure at some moment in time.

It records and retains changes, keeping them safe until ready to commit them.

Index also called stage area.

Content-Addressable Names

Git object store is organized and implemented as as content-addressable storage system.

Each object in the object store has a unique name produced by applying SHA1 to the contents of the object, yielding an SHA1 hash value. SHA1 values are 160-bit values that are usually represented as a 40-digit hexadecimal number.

Git Tracks Content

Git is a content tracking system

Git’s object store is based on the hashed computation of the contents of its objects, not on the file or directory names from the user’s original file layout.

If two separate files have exactly the same content, whether in the same or different directories, Git stores a single copy of that content as a blob within the object store.

# An initial git repo
TankyWoo@Mac::test/ (master) » find .git/objects
.git/objects
.git/objects/info
.git/objects/pack

Add a file a.txt which content is 'hello' to the index, the SHA1 value is ce013625030ba8dba906f756967f9e9ca394464a

TankyWoo@Mac::test/ (master) » echo 'hello' > a.txt
TankyWoo@Mac::test/ (master*) » git add a.txt
TankyWoo@Mac::test/ (master*) » find .git/objects
.git/objects
.git/objects/ce
.git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
.git/objects/info
.git/objects/pack
TankyWoo@Mac::test/ (master*) » git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a
hello

Write this change to the tree object

TankyWoo@Mac::test/ (master*) » git write-tree
2e81171448eb9f2ee3821e3d447aa6b2fe3ddba1
TankyWoo@Mac::test/ (master*) » git cat-file -p 2e81171448eb9f2ee3821e3d447aa6b2fe3ddba1
100644 blob ce013625030ba8dba906f756967f9e9ca394464a    a.txt

Now add another file b.txt, which is the same content with a.txt

TankyWoo@Mac::test/ (master*) » echo 'hello' > b.txt
TankyWoo@Mac::test/ (master*) » git add b.txt
TankyWoo@Mac::test/ (master*) » git write-tree
b5b0cccf7401633f12e0fafc6b85731251b86850
TankyWoo@Mac::test/ (master*) » git cat-file -p b5b0cccf7401633f12e0fafc6b85731251b86850
100644 blob ce013625030ba8dba906f756967f9e9ca394464a    a.txt
100644 blob ce013625030ba8dba906f756967f9e9ca394464a    b.txt

a.txt and b.txt point to the same blob object.

Now change the content of a.txt file.

TankyWoo@Mac::test/ (master*) » echo 'world' >> a.txt
TankyWoo@Mac::test/ (master*) » git add a.txt
TankyWoo@Mac::test/ (master*) » git write-tree
579c3877d5f450e34ea642b3a29d2d01dcf8e392
TankyWoo@Mac::test/ (master*) » git cat-file -p 579c3877d5f450e34ea642b3a29d2d01dcf8e392
100644 blob 94954abda49de8615a048f8d2e64b5de848e27a1    a.txt
100644 blob ce013625030ba8dba906f756967f9e9ca394464a    b.txt

b.txt still point to the old blob, and a.txt point to the new blob

Pathname Versus Content

Git’s physical data layout isn’t modeled after the user’s file directory structure. Instead, it has a completely different structure that can, nonetheless, reproduce the user’s orig- inal layout. Git’s internal structure is a more efficient data structure for its own internal operations and storage considerations.

Practise

Every object store under .git/objects:

TankyWoo@Mac::test-git2/ (master*) » find .git/objects
.git/objects
.git/objects/3b
.git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad
.git/objects/info
.git/objects/pack

TankyWoo@Mac::test-git2/ (master*) » git cat-file -p 3b18e512dba79e4c8300dd08aeb37f8e728b8dad
hello world

Git inserts a / after the first two digits to improve filesystem efficiency.

Such as .git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad, the hash id is 3b18e512dba79e4c8300dd08aeb37f8e728b8dad

Use git cat-file to see the content of object store:

TankyWoo@Mac::test-git2/ (master*) » git cat-file -p 3b18e512dba79e4c8300dd08aeb37f8e728b8dad
hello world

Tip:

Use git rev-parse can parse short hash to completely hash:

TankyWoo@Mac::test-git2/ (master*) » git rev-parse 3b18
3b18e512dba79e4c8300dd08aeb37f8e728b8dad

As mentioned before, Git tracks the pathnames of files through another kind of object called a tree.

When you use git add, Git creates an object(blob) for the contents of each file you add(in .git/objects/), but it doesn’t create an object for your tree right away. Instead, it updates the index.

The index is found in .git/index and keeps track of file pathnames and corre-sponding blobs.

git ls-files can show information about files in the index and the working tree

git write-tree create a tree object from current index by capturing a snapshot of its current information

TankyWoo@Mac::test/ (master*) » git ls-files -s
100644 ce013625030ba8dba906f756967f9e9ca394464a 0       a.txt

TankyWoo@Mac::test/ (master*) » git write-tree
2e81171448eb9f2ee3821e3d447aa6b2fe3ddba1

Tree Hierarchies

TankyWoo@Mac::test/ (master*) » tree .git/objects
.git/objects
├── 2e
│   └── 81171448eb9f2ee3821e3d447aa6b2fe3ddba1
├── ce
│   └── 013625030ba8dba906f756967f9e9ca394464a
├── info
└── pack

The tree object 2e81171448eb9f2ee3821e3d447aa6b2fe3ddba1:

TankyWoo@Mac::test/ (master*) » git cat-file -p 2e81171448eb9f2ee3821e3d447aa6b2fe3ddba1
100644 blob ce013625030ba8dba906f756967f9e9ca394464a    a.txt

Create a sub directory, and copy a.txt in it:

TankyWoo@Mac::test/ (master*) » git add subdir/a.txt
TankyWoo@Mac::test/ (master*) » git write-tree
ec518d6bb3cabb8e88b5458cf18d862aa0514622

In the new tree object, subdir object is a tree object, and point to 2e81171448eb9f2ee3821e3d447aa6b2fe3ddba1.
which is the same as the top tree object.

TankyWoo@Mac::test/ (master*) » git cat-file -p ec518d6bb3cabb8e88b5458cf18d862aa0514622
100644 blob ce013625030ba8dba906f756967f9e9ca394464a    a.txt
040000 tree 2e81171448eb9f2ee3821e3d447aa6b2fe3ddba1    subdir

The new tree for subdir contains only one file, a.txt, and that file contains the same old “hello” content. So the subdir tree is exactly the same as the older, top-level tree! And of course it has the same SHA1 object name as before.

Commits

Use the low-level command git commit-tree to commit the tree object, and generate a commit object:

TankyWoo@Mac::test/ (master*) » echo -n 'Init site\n' | git commit-tree ec518d6bb3cabb8e88b5458cf18d862aa0514622
5c5e63c0ee9a9c51304f352ec0581704411003ad


TankyWoo@Mac::test/ (master*) » git cat-file -p 5c5e63c0ee9a9c51304f352ec0581704411003ad
tree ec518d6bb3cabb8e88b5458cf18d862aa0514622
author Tanky Woo <wtq1990@gmail.com> 1406774332 +0800
committer Tanky Woo <wtq1990@gmail.com> 1406774332 +0800

Init site

author vs committer(from Pro Git)

The author is the person who originally wrote the patch, whereas the committer is the person who last applied the patch. So, if you send in a patch to a project and one of the core members applies the patch, both of you get credit — you as the author and the core member as the committer.

A more detailed explanation see this

Tag

Create an annotated, unsigned tag:

TankyWoo@Mac::test/ (master*) » git tag -m 'version 1.0 tag' v1.0 5c5e63c0ee9a9c51304f352ec0581704411003ad

Get the SHA1 id by tag name:

TankyWoo@Mac::test/ (master*) » git rev-parse v1.0
606d5478f68648e14de7b204d5484e4b83b2a3a0

The tag object:

TankyWoo@Mac::test/ (master*) » git cat-file -p 606d5478f68648e14de7b204d5484e4b83b2a3a0
object 5c5e63c0ee9a9c51304f352ec0581704411003ad
type commit
tag v1.0
tagger Tanky Woo <wtq1990@gmail.com> 1406811935 +0800

version 1.0 tag

NOTE: In this section, most of the commands are the low-level comamnds. In real life, should not use this commands!

File Management and the Index

与其它版本管理系统不同,Git在工作目录(working directory)与已提交历史(history)之间,增加了一个Index(Stage)层,称为索引(暂存)目录。

Linus Torvalds 在git mailing list里提到,如果不先理解Index的目的,就无法理解和领会Git的强大。

Git 把文件分为三个大类: Tracked, UntrackedIgnored,其中Tracked又可以分为StagedUnstaged,在工作目录下修改的文件是Unstaged,经过git add后变为Staged。

Git 的Index不存放任何文件的内容,它只简单的记录准备提交的文件,当运行git commit时,git 会检查Index而不是工作目录。

git ls-files --stage 可以查看stage中的文件的SHA1值:

TankyWoo@Mac::git-test/ (master*) » git ls-files --stage
100644 8d0e41234f24b6da002d962a26c2495ea16a425f 0       fa

git hash-object可以计算文件的SHA1值并输出:

TankyWoo@Mac::git-test/ (master*) » git hash-object fa
8d0e41234f24b6da002d962a26c2495ea16a425f

简单的说是文件fa已经在Index中了,本质是文件在object store中,Index指向它。

关于 --allgit addgit commit中有点区别,文件必须要经过add才会被tracked,git add --all 会把所有tracked 和 untracked的文件都add,但是git commit --all只会add所有tracked的文件并提交。

关于git rm,如果tracked中的文件被修改了,则可以通过git rm --cached来从git库中移除,并在本地保存为untracked的,也可以通过-f强制删除。

对于被误删的文件,如果在Index中,如下:

TankyWoo@Mac::git-test/ (master*) » gst
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       deleted:    fa
#

可以通过 git checkout HEAD -- fagit reset HEAD fa && git checkout -- fa 来返回。

如果是移除被提交,则可以通过git reset HEAD^或rebase来取回。

关于git mv,等价于:

mv fa fb
git rm fa
git add fb

Git 把文件fa改为fb,会在object store中保存原始的文件内容,然后把文件名(路径名path)重新关联到这个内容:

TankyWoo@Mac::git-test/ (master) » git ls-files --stage
100644 15acaeb140c2805acdbb2d0dbdedeeea6bb73b06 0       fa
TankyWoo@Mac::git-test/ (master) » git mv fa fb
TankyWoo@Mac::git-test/ (master*) » git ls-files --stage
100644 15acaeb140c2805acdbb2d0dbdedeeea6bb73b06 0       fb

把fa改为fb后,SHA1值并没变。

经过mv操作后,使用git log fb只会看到变更后的提交(包括变更的那个提交),即关联内容并为fb文件的历史,可以通过git log --follow fb 来查看关联这段内容的完整历史。

Commits

To identify commits, there are two ways: explicit references and a few implied references.

The explicit reference to commit is its hash ID(SHA1).

Git also provides mechanisms for identifying a commit relative to another reference.

such as master^ and master~2 etc.

The caret(^) is used to select a different parent.

Given a commit C, C^1 is the first parent, C^2 is the second parent, C^3 is the third parent, and so on, as shown in Figure 6-1(see on book).

The tilde(~) is used to go back before an ancestral parent and select a preceding generation. Again, given the commit C, C~1 is the first parent, C~2 is the first grandparent, and C~3 is the first great-grandparent. as shown in Figure 6-2.(see on book)

For example, a simple repo log:

*   75b09c2 - (HEAD, master) Merge branch 'dev' (4 seconds ago) <Tanky Woo>
|\
| * 0aab100 - (dev) Add d (26 seconds ago) <Tanky Woo>
| * 6a9379e - Add c (31 seconds ago) <Tanky Woo>
* | 015b5b9 - Add f (14 seconds ago) <Tanky Woo>
|/
* 545851d - Add b (59 seconds ago) <Tanky Woo>
* 1509ece - Add a (80 seconds ago) <Tanky Woo>

Choose the first parent:

$ git log -1 --pretty=oneline --abbrev-commit -p master^1
015b5b9 Add f

Choose the second parent, this is the dev branch commit merged into master:

$ git log -1 --pretty=oneline --abbrev-commit -p master^1
0aab100 Add d

such as master^ refers to the penultimate commit on the master branch.

Use tilde:

$ git log -1 --pretty=oneline --abbrev-commit -p master~1
015b5b9 Add f

master^1 is the same as master~1

If the number is not specified, as master^ or master~, it's default to master^1 or master~1, also, master^^ is the same as master^1^1, and it's the same as master~2.

See the parent of the second parent:

$ git log -1 --pretty=oneline --abbrev-commit -p master^2~1
6a9379e Add c

TODO
2. git show-branch

refs and symrefs

  1. A ref is an SHA1 hash ID that refers to an object within the Git object store. Although a ref may refer to any Git object, it usually refers to a commit object.
  2. A symbolic reference, or symref, is a name that indirectly points to a Git object. It is still just a ref.

refs(reference, 引用), 一般指向某个commit.

Local topic branch names, remote tracking branch names, and tag names are all refs.

本地分支名, 远程分支名, tag名都是refs.

所以如本地分支master, 全名就是.git/refs/heads/master

存放在.git/refs 目录下:

TankyWoo@Mac::simiki/ (master) » tree .git/refs
.git/refs
├── heads
│   ├── dev
│   ├── jinja-extensions
│   └── master
├── remotes
│   └── origin
│       ├── HEAD
│       ├── dev
│       └── master
└── tags
    └── v1.2.1

master这个refs存放的就是master分支的最后一次commit id:

TankyWoo@Mac::simiki/ (master) » more .git/refs/heads/master
569898602add495da34fb8684e39f60d26176a19

tags记录的是最新的一个tag

symrefs(symbol reference, 符号引用), 是一个指向引用的引用(指针).存放在.git/目录下

HEAD: .git/HEAD, 总是指向当前分支的最后一次提交, 当分支改变,HEAD也会变

TankyWoo@Mac::simiki/ (master) » more .git/HEAD
ref: refs/heads/master

ORIG_HEAD: .git/ORIG_HEAD, 一些操作, 如mergereset, 会记录操前的commit(HEAD). 作为一个保护措施,使操作可以回溯.

比如最近三个commits:

* d46546a - (HEAD, master) update d (42 seconds ago) <Tanky Woo>
* 8ed2d79 - update f (76 seconds ago) <Tanky Woo>
* 75b09c2 - (tag: v0.1) Merge branch 'dev' (3 days ago) <Tanky Woo>

ORIG_HEAD 存储的是之前某一个commit:

TankyWoo@Mac::test-git/ (master) » more .git/ORIG_HEAD
015b5b99f5c9973e840f29c9f6e6b936c99b92a5

做一次reset操作:

TankyWoo@Mac::test-git/ (master) » git reset --soft HEAD^

查看ORIG_HEAD, 会指向之前的HEAD:

TankyWoo@Mac::test-git/ (master) » more .git/ORIG_HEAD
d46546a5192b7e1c834947b612e3401a6f7729c7

这样就可以回溯到reset之前的版本:

git reset ORIG_HEAD

HEAD vs ORIG_HEAD HEAD and ORIG_HEAD in Git

FETCH_HEAD: TODO

MERGE_HEAD: TODO

git symbolic-ref TODO

详细可以参考progit-9.3

Viewing Old Commits

Specify a commit range using the form since..until, this will show the commit from since(exclude) to until(include)

$ git log --pretty=oneline --abbrev-commit master~3..master~1
015b5b9 Add f
545851d Add b

Use -p|--patch option to print the patch(changes):

$ git log -1 -p master

This is the same as:

$ git show master

git show can also display blob in remote branch. TODO

$ git show origin/master:setup.py

Notice the -1 to restricts the output to a single commit, otherwise will display all commits in master. type -n to limit output to at most n commits.

--stat option enumerates the files changed in a commit and tallies how many lines were modified in each file.

Commit Ranges

TODO

Finding Commits

git bisect
git blame

TODO

Diffs

four fundamental comparisons:

git diff : between working directory and the index.

git diff <commit> : between working directory and the given commit.

git diff --cached <commit> : between the staged changes in the index and the given commit. --staged is the new synonym of --cached.

git diff <commit1> <commit2> : between the arbitrary two commits.

Options:

-M : #todo#

The --M option detects renames and generates a simplified output that simply re- cords the file rename rather than the complete removal and subsequent addition of the source file. If the rename is not a pure rename but also has some additional content changes, Git calls those out.

-w or --ignore-all-space: compare without considering changes in whitespace as significant.

--stat : show statistics about the difference between any two tree states.

git diff with Path Limiting:

# limit changes in specify directory
git diff some_directory

# limit changes in specify file
git diff some_file

search changes containing string with -S:

# search the past 50 commits to the master branch for changes containing string "octopus"
git diff -S "octopus" master~50