Objective oriented programming
Knuth-Morris-Pratt algorithm
In computer science, the Knuth-Morris-Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re0examination of previously matched characters.
The algorithm was conceived in 1970 by Donald Knuth and Vaughan Pratt, and independelty by James H.Morris.The most straightforward algorithm is to look for a character match at successive values of the index m, the posisiotn in the string being searched, ie. S[m]. If the index m reaches the end of the string then there is no match, in which case the search is said to fial. At each position m the algorithm first checks for equality of the first character in the word being searched, I.e. S[m] = ? W[0]. If a match is found, the algorithm tests the other characters in the word being searched by checking successive values of the word position index, I. The algorithm retrieves the character W[I] in the word being searched and checks for equality of the expression S[m+i] = ? W[I]. If all successive characters match in W at position m, then a match is found at that position in the search string.
Usually, the trail check will quickly reject the trail match. If the strings are uniformly distributed random letters, then the chance that charaters match is 1 in 26. In most cases, the tral check will reject the match at the initial letter. The chance that the first two letters will match is 1 in 26^2 ( 1 in 676). So if the characters are random, then the expected complexity of searching string S[] of length k is on the order of k comparisons or O(k). The expected performance is very good. If S[] is 1 billion characters and W[] is 100 characters, then the string search should complete after about one billion character comparisons.The expected peroframnce is not guaranteed. If the strings are not random, then checking a trial m may take many character comparisons. The worst case is if the two strings match in all but the last letter. Imagine that the string S[] consists of 1 billion characters that are all A, and that the word W[] is 999 A characters terminating in a final B character. The simple string matching algorithm will now examine 1000 characters at each trail position before rejecting the match and advancing the trail position. The simple string search example would now take about 1000 character comparisons times 1 billion positions for 1 trillion character comparisons. If the length of W[] is n, then the worst-case performance is O(kn)The KMP algorithm has a better worst-case performance than the straightforward algorithm. KMP spends a little time precomputing a table, and then it uses that table to do an efficient search of the string in O(k)
The difference is that KMP makes use of previous match information that the straightforward algorithm does not.
Description of pseudocode for the search algorithm
We assume the existence of a "partial match" table T, described below, which indicates where we need to look for the start of a new match in the event that the current one ends in a mismatch. The entries of T are constructed so that if we have a match starting at S[m] that fails when comparing S[m + I] to W[I], then the next possible match will start at index m + I - T[i] in S. This has two implications: first, T[0] = -1, which indicates that if W[0] is a mismatch, we cannot backtrack and must simply check the next character. and second, although the next possible match will begin at index m + I - T[i], as in the example above, we need not actually check any of the T[I] characters after that, so that we continue searching from W[T[I]]. The following is a sample pseudocode implementation of the KMP search algorithm.
algorithm kmp_search:
input: an array of characters, S (the text to be searched) an array of characters, W (the word sought)output: an integer (the zero-based position in S at which W is found)define variables: an integer, m <- 0 (the beginning of the current match in S)
Lecture 3: egrep | permissions| Scripts
egrep cs246 index.html
[]:= pattern|:= or, h|c matches h or c. cat|dog matches cat or dogusage: regexpattern file* := 0 or more repetitions of the preceding subexpression(cs)*246 := ex. 246, cs246, cscs246cs*246 := ex. c246, cs246, css246for * to not have a special meaning, escape it by usingcs\*246?:= 0 or 1 of proceeding subexpressionex. cs ?246:= cs246, cs 246 (cs )?246:= 246, cs 246+:= 1 or more of the proceeding subexpression.:= any one characters.*: any number of any charactersex. cs.*246:= matches lines that contain a string starting with cs & ending with 246use ^ to indicate pattern should match start of line
use $ to indicate pattern must end with end of line"^cs.*246$" ^(cs.*)246$:= 246, csb246, csbcsa246e.g. Fetch lines of even length
(..)* is not enough as you need to say the pattern start and end with (..)*correct: ^(..)*$e.g. files in the current directory that contain exactly one a ls| egrep "^[^a]a[^a]*$" if "" is not added then will be interpreted as a glopping itemwords in the dictionary containing exactly 5 charactersegrep ".{5}" filePermissions
ls -l:= long listing of pwd
--rwxr--xr-- 1 naneem staff 28 Jan10 abcd.txttype|user bits| group bits | others bits shortcuts owners group size last modified filenamew: writex: execute as a programr: read group bits: permissions all others members of group haveuser bits: permissions that the owner hasothers bits: permissions that all others(people not in group staff in this case) hasr for directory:= read contents
w for directory:= write contentsx for directory:= enters a directoryChanging Permissions
only the owner can change permissions
chmod mode file(s)mode:= ownership class:= U_users, g_group, o_others, a_allmode:= operator:= +, -, =mode:= r, w, x425
100, 010, 101r--, -w-, r-xProgramming using shell
vairables
$> x=1variable=valueaccess the value
usage: $x (beter ${x})
In comparison: $(...) is the embedded command linee.x. x=1echo $xy = 1, not working because of spacevalues in shell are stored as stringsdir=~/1171/a1cd $direcho '$dir': single quote won't let embedded code to excecuteScript
Script: text file with Linux commands that are executed as a program
#!/bin/bash:= shebang line change this file into command line filee.x. File basicchmod a+x basicUsage: basic, doesn't work, tell the shell where to find it for example ./basictells the shell that basic is in the current directoryArguments to scripts: e.g. is a word in the dictionary usage: ./isItAWord($0) hello($1)cat isItAWord:= egrep "^$1$" /usr/share/dict/words
!/bin/bash
Answers whetehr a candidate word might be a good password
usage () {
echo "Usage: $0 password" >&2 (find in piazza| redirecting)exit 1
}
Using $0 so that if someone change the file name, it still work${#} := number of args to scriptif [${#} -ne 1]; thenusage
fi
syntax for if: if [condition]; then statement elif [condition]; then statement else statement /n fi
egrep "^$1$" /usr/share/dict/words > /dev/nullLoopse.g. Print numbers 1 to $1x = 1while [$x -le $1]; doecho ${x}x = $((x+1))
done
e.x. for x in a b c d; do echo $x; done
File: RenameCchange all .C files to .ccmv Hello.C Hello.cc
filename = Hello.C mv ${filename} ${filename%C}cc
for name in *.C; do
mv ${name} ${name%C}cc
done
eg. count the number of occurrence of $1 in file $2count = 0for word in $(cat $2); doif [${word}] == $1]; then count = $((count + 1))fi
done
echo ${count}usage: ./countWords "To be" ../sample.txt "" prevent the shell from treat to be as two arguments