12/25
2015

生成器函数被调用之后不会执行函数体而是返回生成器,每次调用next(生成器对象)才会执行函数体

通过列表生成式,我们可以直接创建一个列表。但是,受到内存限制,列表容量肯定是有限的。而且,创建一个包含100万个元素的列表,不仅占用很大的存储空间,如果我们仅仅需要访问前面几个元素,那后面绝大多数元素占用的空间都白白浪费了。

所以,如果列表元素可以按照某种算法推算出来,那我们是否可以在循环的过程中不断推算出后续的元素呢?这样就不必创建完整的list,从而节省大量的空间。在Python中,这种一边循环一边计算的机制,称为生成器(Generator)。

要创建一个generator,有很多种方法。第一种方法很简单,只要把一个列表生成式的[]改成(),就创建了一个generator:

>>> L = [x * x for x in range(10)]
>>> L
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> g = (x * x for x in range(10))
>>> g
<generator object <genexpr> at 0x7fecaa22fc60>

创建Lg的区别仅在于最外层的[]()L是一个list,而g是一个generator

我们可以直接打印出list的每一个元素,但我们怎么打印出generator的每一个元素呢?

如果要一个一个打印出来,可以通过generator的next()方法:

>>> next(g)
0
>>> next(g)
1
>>> next(g)
4
>>> next(g)
9
>>> next(g)
16
>>> next(g)
25
>>> next(g)
36
>>> next(g)
49
>>> next(g)
64
>>> next(g)
81
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

我们讲过,generator保存的是算法,每次调用next(),就计算出下一个元素的值,直到计算到最后一个元素,没有更多的元素时,抛出StopIteration的错误。

当然,上面这种不断调用next()方法实在是太变态了,正确的方法是使用for循环,因为generator也是可迭代对象:

>>> g = (x * x for x in range(10))
>>> for n in g:
...     print(n)
... 
0
1
4
9
16
25
36
49
64
81
>>> 

所以,我们创建了一个generator后,基本上永远不会调用next()方法,而是通过for循环来迭代它。

generator非常强大。如果推算的算法比较复杂,用类似列表生成式的for循环无法实现的时候,还可以用函数来实现。

比如,著名的斐波拉契数列(Fibonacci),除第一个和第二个数外,任意一个数都可由前两个数相加得到:

1, 1, 2, 3, 5, 8, 13, 21, 34, ...

斐波拉契数列用列表生成式写不出来,但是,用函数把它打印出来却很容易:

>>> def fib(max):
...     n, a, b = 0, 0, 1
...     while n < max:
...         print(b)
...         a, b = b, a + b
...         n = n + 1
... 
>>> fib(6)
1
1
2
3
5
8
>>> 

仔细观察,可以看出,fib函数实际上是定义了斐波拉契数列的推算规则,可以从第一个元素开始,推算出后续任意的元素,这种逻辑其实非常类似generator。
也就是说,上面的函数和generator仅一步之遥。要把fib函数变成generator,只需要把print(b)改为yield b就可以了:

>>> def fib(max):
...     n, a, b = 0, 0, 1
...     while n < max:
...         yield b
...         a, b = b, a + b
...         n = n + 1
... 

这就是定义generator的另一种方法。如果一个函数定义中包含yield关键字,那么这个函数就不再是一个普通函数,而是一个generator:

>>> fib(6)
<generator object fib at 0x104feaaa0>

这里,最难理解的就是generator和函数的执行流程不一样。函数是顺序执行,遇到return语句或者最后一行函数语句就返回。而变成generator的函数,在每次调用next()的时候执行,遇到yield语句返回,再次执行时从上次返回的yield语句处继续执行。

举个简单的例子,定义一个generator,依次返回数字1,3,5:

>>> def odd():
...     print('step 1')
...     yield 1
...     print('step 2')
...     yield 3
...     print('step 3')
...     yield 5
... 
>>> o = odd()
>>> next(o)
step 1
1
>>> next(o)
step 2
3
>>> next(o)
step 3
5
>>> next(o)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

可以看到,odd不是普通函数,而是generator,在执行过程中,遇到yield就中断,下次又继续执行。执行3次yield后,已经没有yield可以执行了,所以,第4次调用next()就报错。

回到fib的例子,我们在循环过程中不断调用yield,就会不断中断。当然要给循环设置一个条件来退出循环,不然就会产生一个无限数列出来。

同样的,把函数改成generator后,我们基本上从来不会用next()来调用它,而是直接使用for循环来迭代:

>>> for n in odd():
...     print(n)
... 
step 1
1
step 2
3
step 3
5
>>> 

小结

generator是非常强大的工具,在Python中,可以简单地把列表生成式改成generator,也可以通过函数实现复杂逻辑的generator。

要理解generator的工作原理,它是在for循环的过程中不断计算出下一个元素,并在适当的条件结束for循环。对于函数改成的generator来说,遇到return语句或者执行到函数体最后一行语句,就是结束generator的指令,for循环随之结束。

12/25
2015

在计算机程序的开发过程中,随着程序代码越写越多,在一个文件里代码就会越来越长,越来越不容易维护。
为了编写可维护的代码,我们把很多函数分组,分别放到不同的文件里,这样,每个文件包含的代码就相对较少,很多编程语言都采用这种组织代码的方式。在Python中,一个.py文件就称之为一个模块(Module)。

使用模块有什么好处?

最大的好处是大大提高了代码的可维护性。其次,编写代码不必从零开始。当一个模块编写完毕,就可以被其他地方引用。我们在编写程序的时候,也经常引用其他模块,包括Python内置的模块和来自第三方的模块。

使用模块还可以避免函数名和变量名冲突。相同名字的函数和变量完全可以分别存在不同的模块中,因此,我们自己在编写模块时,不必考虑名字会与其他模块冲突。但是也要注意,尽量不要与内置函数名字冲突。

你也许还想到,如果不同的人编写的模块名相同怎么办?为了避免模块名冲突,Python又引入了按目录来组织模块的方法,称为包(Package)。
举个例子,一个abc.py的文件就是一个名字叫abc的模块,一个xyz.py的文件就是一个名字叫xyz的模块。

现在,假设我们的abcxyz这两个模块名字与其他模块冲突了,于是我们可以通过包来组织模块,避免冲突。方法是选择一个顶层包名,比如mycompany,按照如下目录存放:

mycompany
引入了包以后,只要顶层的包名不与别人冲突,那所有模块都不会与别人冲突。现在,abc.py模块的名字就变成了mycompany.abc,类似的,xyz.py的模块名变成了mycompany.xyz

请注意,每一个包目录下面都会有一个__init__.py的文件,这个文件是必须存在的,否则,Python就把这个目录当成普通目录,而不是一个包。__init__.py可以是空文件,也可以有Python代码,因为__init__.py本身就是一个模块,而它的模块名就是mycompany
类似的,可以有多级目录,组成多级层次的包结构。比如如下的目录结构:
可以有多级目录,组成多级层次的包结构
文件www.py的模块名就是mycompany.web.www,两个文件utils.py的模块名分别是mycompany.utilsmycompany.web.utils
自己创建模块时要注意命名,不能和Python自带的模块名称冲突。例如,系统自带了sys模块,自己的模块就不可命名为sys.py,否则将无法导入系统自带的sys模块。

12/25
2015

I am studying Alex Marteli’s Python in a Nutshell and the book suggests that any object that has a next() method is (or at least can be used as) an iterator. It also suggests that most iterators are built by implicit or explicit calls to a method called iter.

After reading this in the book, I felt the urge to try it. I fired up a python 2.7.3 interpreter and did this:

>>> x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> for number in range(0, 10):
...     print x.next()

However the result was this:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
AttributeError: 'list' object has no attribute 'next'

In confusion, I tried to study the structure of the x object via dir(x) and I noticed that it had a __iter__ function object. So I figured out that it can be used as an iterator, so long as it supports that type of interface.

So when I tried again, this time slightly differently, attempting to do this:

>>> _temp_iter = next(x)

I got this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list object is not an iterator

But how can a list NOT be an iterator, since it appears to support this interface, and can be certainly used as one in the following context:

>>> for number in x:
...     print x

They are iterable, but they are not iterators. They can be passed to iter() to get an iterator for them either implicitly (e.g. via for) or explicitly, but they are not iterators in and of themselves.

Note that all iterators (which are well-behaved) are also iterable – their next simply returns self, so you can call iter(iter(iter(iter(x)))) and get the same thing as iter(x). This is why for works with both iterables and iterators without type sniffing (well, disregarding performance optimizations).

You need to convert list to an iterator first using iter():

In [7]: x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [8]: it=iter(x)

In [9]: for i in range(10):
    it.next()
   ....:     
   ....:     
Out[10]: 0
Out[10]: 1
Out[10]: 2
Out[10]: 3
Out[10]: 4
Out[10]: 5
Out[10]: 6
Out[10]: 7
Out[10]: 8
Out[10]: 9

In [12]: 'next' in dir(it)
Out[12]: True

In [13]: 'next' in dir(x)
Out[13]: False

checking whether an object is iterator or not:

In [17]: isinstance(x,collections.Iterator)
Out[17]: False

In [18]: isinstance(x,collections.Iterable)
Out[18]: True

In [19]: isinstance(it,collections.Iterable) 
Out[19]: True

In [20]: isinstance(it,collections.Iterator)
Out[20]: True

	
answered Oct 24 '12 at 17:03
Ashwini Chaudhary
119k13160235
	
add a comment
up vote
9
down vote

Just in case you are confused about what the difference between iterables and iterators is. An iterator is an object representing a stream of data. It implements the iterator protocol:

    __iter__ method
    next method

Repeated calls to the iterator’s next() method return successive items in the stream. When no more data is available the iterator object is exhausted and any further calls to its next() method just raise StopIteration again.

On the other side iterable objects implement the __iter__ method that when called returns an iterator, which allows for multiple passes over their data. Iterable objects are reusable, once exhausted they can be iterated over again. They can be converted to iterators using the iter function.

So if you have a list (iterable) you can do:

>>> l = [1,2,3,4]
>>> for i in l:
...     print i,
1 2 3 4
>>> for i in l:
...     print i,
 1 2 3 4

If you convert your list into an iterator:

>>> il = l.__iter__()  # equivalent to iter(l)
>>> for i in il:
...     print i,
 1 2 3 4
>>> for i in il:
...     print i,
>>> 
12/24
2015

Generators functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.

Simplified Code

The simplification of code is a result of generator function and generator expression support provided by Python.

To illustrate this, we will compare different implementations that implement a function, “firstn”, that represents the first n non-negative integers, where n is a really big number, and assume (for the sake of the examples in this section) that each integer take up a lot of space, say 10 megabytes each.

Note: Please note that in real life, integers do not take up that much space, unless they are really, really, really, big integers. For instance you can represent a 309 digit number with 128 bytes (add some overhead, it will still be less than 150 bytes).

First, let us consider the simple example of building a list and returning it.

# Build and return a list

def firstn(n):
    num, nums = 0, []
    while num < n:
	nums.append(num)
	num += 1
    return nums

sum_of_first_n = sum(firstn(1000000))

The code is quite simple and straightforward, but its builds the full list in memory. This is clearly not acceptable in our case, because we cannot afford to keep all n “10 megabyte” integers in memory.

So, we resort to the generator pattern. The following implements generator as an iterable object.

# Using the generator pattern (an iterable)
class firstn(object):
    def __init__(self, n):
	self.n = n
	self.num, self.nums = 0, []

    def __iter__(self):
	return self

    # Python 3 compatibility
    def __next__(self):
	return self.next()

    def next(self):
	if self.num < self.n:
	    cur, self.num = self.num, self.num + 1
	    return cur
	else:
	    raise StopIteration()


sum_of_first_n = sum(firstn(1000000))

This will perform as we expect, but we have the following issues:

  • there is a lot of boilerplate
  • the logic has to be expressed in a somewhat convoluted way

Furthermore, this is a pattern that we will use over and over for many similar constructs. Imagine writing all that just to get an iterator.
Python provides generator functions as a convenient shortcut to building iterators. Lets us rewrite the above iterator as a generator function:

# a generator that yields items instead of returning a list
def firstn(n):
    num = 0
    while num < n:
	yield num
	num += 1

sum_of_first_n = sum(firstn(1000000))
print(sum_of_first_n)

原来生成器函数被调用是返回一个生成器,而不是执行函数体

def firstn(n):
    print("call the firstn()")
    num = 0
    while num < n:
	yield num
	num += 1

print(firstn(10))

并没有打印出call the firstn(),因为生成器函数被调用之后只会返回生成器对象

Note that the expression of the number generation logic is clear and natural. It is very similar to the implementation that built a list in memory, but has the memory usage characteristic of the iterator implementation.

Note: the above code is perfectly acceptable for expository purposes, but remember that in Python 2 firstn() is equivalent to the built-in xrange() function, and in Python 3 range() is a generator.(range()不是generator,是iterable) The built-ins will always be much faster.

Generator expressions provide an additional shortcut to build generators out of expressions similar to that of list comprehensions.

In fact, we can turn a list comprehension into a generator expression by replacing the square brackets (“[ ]”) with parentheses. Alternately, we can think of list comprehensions as generator expressions wrapped in a list constructor.

Consider the following example:

	# list comprehension
	doubles = [2 * n for n in range(50)]

	# same as the list comprehension above
	doubles = list(2 * n for n in range(50))

Notice how a list comprehension looks essentially like a generator expression passed to a list constructor.

By allowing generator expressions, we don’t have to write a generator function if we do not need the list. If only list comprehensions were available, and we needed to lazily build a set of items to be processed, we will have to write a generator function.

This also means that we can use the same syntax we have been using for list comprehensions to build generators.

Keep in mind that generators are a special type of iterator, and that containers like list and set are also iterables. The uniform way in which all of these are handled, adds greatly to the simplification of code.

Improved Performance

The performance improvement from the use of generators is the result of the lazy (on demand) generation of values, which translates to lower memory usage. Furthermore, we do not need to wait until all the elements have been generated before we start to use them. This is similar to the benefits provided by iterators, but the generator makes building iterators easy.

This can be illustrated by comparing the range and xrange built-ins of Python 2.x.

Both range and xrange represent a range of numbers, and have the same function signature, but range returns a list while xrange returns a generator (at least in concept; the implementation may differ).

Say, we had to compute the sum of the first n, say 1,000,000, non-negative numbers.

	# Note: Python 2.x only
	#using a non-generator
	sum_of_first_n = sum(range(1000000))

	# using a generator
	sum_of_first_n = sum(xrange(100000))

Note that both lines are identical in form, but the one using range is much more expensive.

When we use range we build a 1,000,000 element list in memory and then find its sum. This is a waste, considering that we use these 1,000,000 elements just to compute the sum.

This waste becomes more pronounced as the number of elements (our n) becomes larger, the size of our elements become larger, or both.

On the other hand, when we use xrange, we do not incur the cost of building a 1,000,000 element list in memory. The generator created by xrange will generate each number, which sum will consume to accumulate the sum.

In the case of the “range” function, using it as an iterable is the dominant use-case, and this is reflected in Python 3.x, which makes the range built-in return a generator instead of a list.

Note: a generator will provide performance benefits only if we do not intend to use that set of generated values more than once.

Consider the following example:

	# Note: Python 2.x only
	s = sum(xrange(1000000))
	p = product(xrange(1000000))

Imagine that making a integer is a very expensive process. In the above code, we just performed the same expensive process twice. In cases like this, building a list in memory might be worth it (see example below):

	# Note: Python 2.x only
	nums = list(xrange(1000000))
	s = sum(nums)
	p = product(nums)

However, a generator might still be the only way, if the storage of these generated objects in memory is not practical, and it might be worth to pay the price of duplicated expensive computations.

Examples

For example, the RangeGenerator can be used to iterate over a large number of values, without creating a massive list (like range would)

	#the for loop will generate each i (i.e. 1,2,3,4,5, ...), add it to total,  and throw it away
	#before the next i is generated.  This is opposed to iterating through range(...), which creates
	#a potentially massive list and then iterates through it.
	total = 0
	for i in irange(1000000):
	total += i

Generators can be composed. Here we create a generator on the squares of consecutive integers.

	#square is a generator
	square = (i*i for i in irange(1000000))
	#add the squares
	total = 0
	for i in square:
	    total += i

Here, we compose a square generator with the takewhile generator, to generate squares less than 100

	#add squares less than 100
	square = (i*i for i in count())
	bounded_squares = takewhile(lambda x : x< 100, square)
	total = 0
	for i in bounded_squares:
	total += i

Discussion

I once saw MikeOrr demonstrate Before and After examples. But, I forget how they worked.
Can someone demonstrate here?
He did something like: Show how a normal list operation could be written to use generators. Something like:

def double(L):
        return [x*2 for x in L]

    eggs = double([1, 2, 3, 4, 5])

…he showed how that, or something like that, could be rewritten using iterators, generators.

It’s been a while since I’ve seen it, I may be getting this all wrong.

# explicitly write a generator function
def double(L):
       for x in L:
            yield x*2

# eggs will be a generator
eggs = double([1, 2, 3, 4, 5])

# the above is equivalent to ("generator comprehension"?)
eggs = (x*2 for x in [1, 2, 3, 4, 5])

# need to do this if you need a list
eggs = list(double([1, 2, 3, 4, 5]))

# the above is equivalent to (list comprehension)
eggs = [x*2 for x in [1, 2, 3, 4, 5]]

For the above example, a generator comprehension or list comprehension is sufficient unless you need to apply that in many places.

Also, a generator function will be cleaner and more clear, if the generated expressions are more complex, involve multiple steps, or depend on additional temporary state.

Consider the following example:

def unique(iterable, key=lambda x: x):
        seen = set()
        for elem, ekey in ((e, key(e)) for e in iterable):
            if ekey not in seen:
                yield elem
                seen.add(ekey)

Here, the temporary keys collector, seen, is a temporary storage that will just be more clutter in the location where this generator will be used.

Even if we were to use this only once, it is worth writing a function (for the sake of clarity; remember that Python allows nested functions).

12/24
2015

Iterables are objects that iter can be used on to obtain an iterator. Iterators are objects that can be iterated through using next. Generators is a category of iterators (generator functions and generator expressions).

range is a class of immutable iterable objects. Their iteration behavior can be compared to lists: you can’t call next directly on them; you have to get an iterator by using iter.
So no, range is not a generator.
You may be thinking, “why didn’t they make it directly iterable”? Well, ranges have some useful properties that wouldn’t be possible that way:

  • They are immutable, so they can be used as dictionary keys.
  • They have the start, stop and step attributes (since Python 3.3), count and index methods and they support in, len and __getitem__ operations.
  • You can iterate over the same range multiple times.

    myrange = range(1, 21, 2) myrange.start 1 myrange.step 2 myrange.index(17) 8 myrange.index(18) Traceback (most recent call last): File “", line 1, in ValueError: 18 is not in range it = iter(myrange) it

    <range_iterator object at 0x7f504a9be960>  »> next(it) 1  »> next(it) 3  »> next(it) 5

Another nice feature of range objects is that they have a __contains__ method which can be used to test whether a value is in a range: 5 in range(10) => True

12/23
2015

Iteration is a general term for taking each item of something, one after another. Any time you use a loop, explicit or implicit, to go over a group of items, that is iteration.
In Python, iterable and iterator have specific meanings.
An iterable is an object that has an __iter__ method which returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.
An iterator is an object with a next (Python 2) or __next__ (Python 3) method.
Whenever you use a for loop, or map, or a list comprehension, etc. in Python, the next method is called automatically to get each item from the iterator, thus going through the process of iteration.

12/23
2015

An ITERABLE is:

  • anything that can be looped over (i.e. you can loop over a string or file)
  • anything that can appear on the right-side of a for-loop:for x in iterable: ...
  • anything you can call with iter() have it return an ITERATOR:iter(obj)
  • an object that defines __iter__ that returns a fresh ITERATOR, or it may have a __getitem__ method suitable for indexed lookup.

An ITERATOR is:

  • an object with state that remembers where it is during iteration
  • an object with a __next__ method (Python 3; next before) that:
    • returns the next value in the iteration
    • updates the state to point at the next value
    • signals when it is done by raising StopIteration
  • an object that is self-iterable (meaning that it has an __iter__ method that returns self). (一个iterator有必要有这个方法吗?那它不就是iterable?最令人困惑的是一个iterator只能在for循环中使用一次。)

The builtin function next calls the __next__ (Python 3) or next (Python 2) method on the object passed to it.

For example:

>>> s = 'cat'      # s is an ITERABLE
    	           # s is a str object that is immutable
            	   # s has no state
               	   # s has a __getitem__() method 

>>> t = iter(s)    # t is an ITERATOR
       		   # t has state (it starts by pointing at the "c"
               	   # t has a next() method and an __iter__() method

>>> next(t)        # the next() function returns the next value and advances the state
'c'
>>> next(t)        # the next() function returns the next value and advances
'a'
>>> next(t)        # the next() function returns the next value and advances
't'
>>> next(t)        # next() raises StopIteration to signal that iteration is complete
Traceback (most recent call last):
...
StopIteration

>>> iter(t) is t   # the iterator is self-iterable
12/23
2015

In natural language,

iteration is the process of taking one element at a time in a row of elements.

In Python,

  • iterable is an object that is, well, iterable, which simply put, means that it can be used in iteration, e.g. with a for loop. How? By using iterator.
  • … while iterator is an object that defines how to actually do the iteration–specifically what is the next element. That’s why it must have next() method. Iterator is not iterable(争议的地方,虽然给iterable添加了__iter__方法,但是iterator只能遍历一次), though.
    So what does Python interpreter think when it sees for x in obj: statement?

  • Look, a for loop. Looks like a job for an iterator… Let’s get one. … There’s this obj guy, so let’s ask him.
  • “Mr. obj, do you have your iterator?” (… calls iter(obj), which calls obj.__iter__(), which happily hands out a shiny new iterator _i.)
  • OK, that was easy… Let’s start iterating then. (x = _i.next() ... x = _i.next()...)

Since Mr. obj succeeded in this test (by having certain method returning a valid iterator), we reward him with adjective: you can now call him “iterable Mr. obj”.

However, in simple cases, you don’t normally benefit from having iterator and iterable separately. So you define only one object, which is also its own iterator. (Python does not really care that _i handed out by obj wasn’t all that shiny, but just the obj itself.)
This is why in most examples I’ve seen (and what had been confusing me over and over), you can see:

class IterableExample(object):

    def __iter__(self):
	return self

    def next(self):
	pass

instead of

class Iterator(object):
    def next(self):
	pass

class Iterable(object):
    def __iter__(self):
	return Iterator()

There are cases, though, when you can benefit from having iterator separated from the iterable, such as when you want to have one row of items, but more “cursors”(一排数据需要遍历多次,就是有很多个游标). For example when you want to work with “current” and “forthcoming” elements, you can have separate iterators for both. Or multiple threads pulling from a huge list: each can have its own iterator to traverse over all items.

Imagine what you could do:

class SmartIterableExample(object):

    def create_iterator(self):
	# An amazingly powerful yet simple way to create arbitrary
	# iterator, utilizing object state (or not, if you are fan
	# of functional), magic and nuclear waste--no kittens hurt.
	pass    # don't forget to add the next() method

    def __iter__(self):
	return self.create_iterator()

Notes:
I’ll repeat again: iterator is not iterable. Iterator cannot be used as a “source” in for loop. What for loop primarily needs is __iter__() (that returns something with next()).
Of course, for is not the only iteration loop, so above applies to some other constructs as well (while…).
Iterator’s next() can throw StopIteration to stop iteration. Does not have to, though, it can iterate forever or use other means.
In the above “thought process”, _i does not really exist. I’ve made up that name.
There’s a small change in Python 3.x: next() method (not the built-in) now must be called __next__(). Yes, it should have been like that all along.
You can also think of it like this: iterable has the data, iterator pulls the next item

Disclaimer: I’m not a developer of any Python interpreter, so I don’t really know what the interpreter “thinks”. The musings above are solely demonstration of how I understand the topic from other explanations, experiments and real-life experience of a Python newbie.

I think all iterators are themselves iterable. When the built-in function iter is called on an iterable, it adds both __iter__ and next methods on the object that it returns. an iterator is an iterable. From the PEP on iterators: “a class that wants to be an iterator should implement two methods: a next() method…and an __iter__() method that returns self.” An iterator is iterable because it return a valid iterator when iter() is called on it - in this case, the iterator returns itself, and we know it is a valid iterator because it defines the next() method.
I realize that my answer is incorrect at that point. I tried to fix it but could not come up with something without me breaking it
You do define the iterator code next(), or __next__() but the actual iterator is an object that will be created each time something wants to iterate over the object. That way you can have eg. two passes over the same list at the same time without interfering with each other; each will have new iterator intitially set to first item, then returning it and moving one item further, etc… (In fact, those items may not really exist in the object; they may be fetched or created on the go.)
Actually you can try it with any simple list: lst = ['a', 'b', 'c']; we want to iterate? let’s get an iterator: ir1 = iter(lst)… first item: next(ir1) will be ‘a’, calling next(ir1) gives ‘b’ etc. This is what python does behind the scenes with each for. At any point, though, you can always get a new iterator: ir2 = iter(lst) that will start from scratch: next(ir2) gives you ‘a’ again when next(ir1) already gives ‘c’.

Iterable是iterator,但是iterator不一定是iterable.

12/21
2015

Iterator objects in python conform to the iterator protocol, which basically means they provide two methods: __iter__() and next(). The __iter__ returns the iterator object and is implicitly called at the start of loops. The next() method returns the next value and is implicitly called at each loop increment. next() raises a StopIteration exception when there are no more value to return, which is implicitly captured by looping constructs to stop iterating.
Here’s a simple example of a counter:

class Counter:
    def __init__(self, low, high):
	self.current = low
	self.high = high

    def __iter__(self):
	return self

    def next(self):
	if self.current > self.high:
	    raise StopIteration
	else:
	    self.current += 1
	    return self.current - 1

for c in Counter(3, 8):
    print c

This will print:

[test@Master python]$ python counter.py
3
4
5
6
7
8

python3的版本:

class Counter:
    def __init__(self, low, high):
	self.current = low
	self.high = high

    def __iter__(self):
	return self

    def __next__(self):
	if self.current > self.high:
	    raise StopIteration
	else:
	    self.current += 1
	    return self.current - 1

for c in Counter(3, 8):
    print(c)

运行结果:

[test@Master python]$ python3 counter.py
3
4
5
6
7
8

上面这个类虽然已经可以遍历了,但是却不能重复遍历

class Counter:
    def __init__(self, low, high):
	self.low = low
	self.high = high

    def __iter__(self):
	self.current = self.low
	return self

    def __next__(self):
	if self.current > self.high:
	    raise StopIteration
	else:
	    self.current += 1
	    return self.current - 1

counter = Counter(3, 8)

for c in counter:
    print(c)

print('==========')

for c in counter:
    print(c)

我感觉我这个类改写得真好,但是实际上去不是这样的,因为这是一个iterator,而不是一个iterable,所以这样改写是错误的,如果你要改写,应该如下:

class Counter:
    def __init__(self, low, high):
	self.current = low
	self.high = high

    def __iter__(self):
	return self

    def __next__(self):
	if self.current > self.high:
	    raise StopIteration
	else:
	    self.current += 1
	    return self.current - 1

class IterableCounter:
    def __iter__(self):
	return Counter(3, 9)

for c in IterableCounter():
    print(c)

print('==========')

for c in IterableCounter():
    print(c)

This is easier to write using a generator:

def counter(low, high):
    current = low
    while current <= high:
	yield current
	current += 1

for c in counter(3, 8):
    print(c)

The printed output will be the same. Under the hood, the generator object supports the iterator protocol and does something roughly similar to the class Counter.

12/20
2015
L = [1,2,3,4,5,6,7,8,9]

要生成下面这样的字典:

{1: 10, 2: 20, 3: 30, 4: 40, 5: 50, 6: 60, 7: 70, 8: 80, 9: 90}

以前需要这样:

dict([(v, v*10) for v in L])

现在可以直接这样:

{v: v*10 for v in L}

附录:

一、什么是字典?

字典是Python语言中唯一的映射类型。
映射类型对象里哈希值和指向的对象(值,value)是一对多的的关系,通常被认为是可变的哈希表。
字典对象是可变的,它是一个容器类型,能存储任意个数的Python对象,其中也可包括其他容器类型。

字典类型与序列类型的区别:
存取和访问数据的方式不同。
序列类型只用数字类型的键(从序列的开始按数值顺序索引);
映射类型可以用其他对象类型作键(如:数字、字符串、元祖,一般用字符串作键),和序列类型的键不同,映射类型的键直接或间接地和存储数据值相关联。
映射类型中的数据是无序排列的。这和序列类型是不一样的,序列类型是以数值序排列的。
映射类型用键直接“映射”到值。

字典是Python中最强大的数据类型之一。

二、如何创建字典和给字典赋值

简单地说字典就是用大括号包裹的键值对的集合。(键值对也被称作项)

一般形式:

adict = {}
adict = {key1:value2, key2:value2, …}

或用dict()函数,如,adict = dict()adict = dict((['x',1],['y',2]))
关键字参数创建字典,如:

>>> adict = dict(name = 'allen', age = 40)
>>> adict
{'age': 40, 'name': 'allen'}

或用fromkeys()方法,如:

>>> adict = {}.fromkeys(('x', 'y'), -1)
>>> adict
{'y': -1, 'x': -1}
>>> 

>>> adict = {}.fromkeys(('x', 'y', -1))
>>> adict
{'y': None, 'x': None, -1: None}

特点:
1、键与值用冒号“”分开;
2、项与项用逗号“”分开;
3、字典中的键必须是唯一的,而值可以不唯一。

>>> adict = {'name': 'allen', 'name': 'lucy', 'age': '40'}
>>> adict
{'age': '40', 'name': 'lucy'}
>>> bdict = {'name': 'allen', 'name2': 'allen', 'age': '40'}
>>> bdict
{'age': '40', 'name2': 'allen', 'name': 'allen'}

三、字典的基本操作

1、如何访问字典中的值?

>>> bdict['age']
32
>>> bdict['age2']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'age2'

2、如何检查key是否在字典中?

a、has_key()方法

>>> bdict.has_key('age')
True
>>> bdict.has_key('age2')
False

b、innot in

>>> 'age' in bdict
True
>>> 'age2' in bdict
False

3、如何更新字典?

a、添加一个数据项(新元素)或键值对

>>> bdict
{'age': 32, 'name2': 'allen', 'name': 'allen'}
>>> bdict['address'] = 'address'
>>> bdict
{'age': 32, 'name2': 'allen', 'name': 'allen', 'address': 'address'}

b、更新一个数据项(元素)或键值对

>>> bdict
{'age': 32, 'name2': 'allen', 'name': 'allen', 'address': 'address'}
>>> bdict['age'] = 40
>>> bdict
{'age': 40, 'name2': 'allen', 'name': 'allen', 'address': 'address'}

c、删除一个数据项(元素)或键值对

>>> bdict
{'age': 40, 'name2': 'allen', 'name': 'allen', 'address': 'address'}
>>> del bdict['age']
>>> bdict
{'name2': 'allen', 'name': 'allen', 'address': 'address'}
>>> bdict.pop('name2')
'allen'
>>> bdict
{'name': 'allen', 'address': 'address'}

>>> del bdict
>>> bdict
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'bdict' is not defined

四、映射类型操作符

标准类型操作符(+-*<,>,<=,>=,==,!=,and,or, not)
a、字典不支持拼接和重复操作符(+*

>>> adict = {'age' : 30}
>>> bdict = {'age' : 40}
>>> adict < bdict
True
>>> adict > bdict
False
>>> adict <= bdict
True
>>> adict >= bdict
False
>>> adict and bdict
{'age': 40}
>>> adict or bdict
{'age': 30}
>>> not adict 
False

b、字典的比较操作
1.先比较字典的长度也就是字典的元素个数
2.键比较 3.值比较 例子:

>>> adict = {}
>>> bdict = {'age': 30}
>>> cmp(adict, bdict)
-1
>>> adict = {'age' : 30}
>>> cmp(adict, bdict)
0
>>> adict = {'age' : 30, 'name' : 'name'}
>>> cmp(adict, bdict)
1

五、映射相关的函数

1、len() 返回字典的长度
2、hash() 返回对象的哈希值,可以用来判断一个对象能否用来作为字典的键
3、dict() 工厂函数,用来创建字典

六、字典的方法

1、adict.keys() 返回一个包含字典所有KEY的列表;
2、adict.values() 返回一个包含字典所有value的列表;
3、adict.items() 返回一个包含所有(键,值)元祖的列表;
4、adict.clear() 删除字典中的所有项或元素;
5、adict.copy() 返回一个字典浅拷贝的副本;
6、adict.fromkeys(seq, val=None) 创建并返回一个新字典,以seq中的元素做该字典的键,val做该字典中所有键对应的初始值(默认为None);
可惜这个方法有局限性,比如我希望给每个key值都赋予独一无二的value:

>>> adict.fromkeys([1,2,3], ['1', '2', '3'])
{1: ['1', '2', '3'], 2: ['1', '2', '3'], 3: ['1', '2', '3']}

是办不到的
7、adict.get(key, default = None) 返回字典中key对应的值,若key不存在字典中,则返回default的值(default默认为None);
8、adict.has_key(key) 如果key在字典中,返回True,否则返回False。 现在用innot in
9、adict.iteritems()adict.iterkeys()adict.itervalues() 与它们对应的非迭代方法一样,不同的是它们返回一个迭代子,而不是一个列表;
10、adict.pop(key[,default]) get方法相似。如果字典中存在key,删除并返回key对应的vuale;如果key不存在,且没有给出default的值,则引发keyerror异常;
11、adict.setdefault(key, default=None)set()方法相似,但如果字典中不存在Key键,由adict[key] = default为它赋值;
12、adict.update(bdict) 将字典bdict的键值对添加到字典adict中。

七、字典的遍历

1、遍历字典的key(键)

>>> adict = {'name': 'name', 'age': 20}
>>> for key in adict.keys():
...     print key
... 
age
name

2、遍历字典的value(值)

>>> for value in adict.values():
...     print value
... 
20
name

3、遍历字典的项(元素)

>>> for item in adict.items():
...     print item
... 
('age', 20)
('name', 'name')

4、遍历字典的key-value

>>> for item, value in adict.items():
...     print 'key = %s, value = %s' %(item, value)
... 
key = age, value = 20
key = name, value = name
>>> for item, value in adict.iteritems():
...     print 'key = %s, value = %s' %(item, value)
... 
key = age, value = 20
key = name, value = name

八、使用字典的注意事项

1、不能允许一键对应多个值;
2、键必须是可哈希的。

12/20
2015

输出

print加上字符串,就可以向屏幕上输出指定的文字。比如输出’hello, world‘,用代码实现如下:

>>> print 'hello, world'     

print语句也可以跟上多个字符串,用逗号“,”隔开,就可以连成一串输出:

>>> print 'The quick brown fox', 'jumps over', 'the lazy dog'
The quick brown fox jumps over the lazy dog

print会依次打印每个字符串,遇到逗号“,”会输出一个空格,因此,输出的字符串是这样拼起来的:

print-howto

如果不使用逗号呢?

>>> print 'The quick brown fox' 'jumps over' 'the lazy dog'
The quick brown foxjumps overthe lazy dog

不会输出空格

print也可以打印整数,或者计算结果:

>>> print 300
300
>>> print 100 + 200
300

因此,我们可以把计算100 + 200的结果打印得更漂亮一点:

>>> print '100 + 200 =', 100 + 200
100 + 200 = 300

注意,对于100 + 200,Python解释器自动计算出结果300,但是,'100 + 200 ='是字符串而非数学公式,Python把它视为字符串。

输入

如果要让用户从电脑输入一些字符怎么办?Python提供了一个raw_input,可以让用户输入字符串,并存放到一个变量里。比如输入用户的名字:

>>> name = raw_input()
Michael
>>> name
'Michael'

当你输入name = raw_input()并按下回车后,Python交互式命令行就在等待你的输入了。这时,你可以输入任意字符,然后按回车后完成输入。
输入完成后,不会有任何提示,Python交互式命令行又回到>>>状态了。那我们刚才输入的内容到哪去了?答案是存放到name变量里了。可以直接输入name查看变量内容:

>>> name
'Michael'

什么是变量?请回忆初中数学所学的代数基础知识:
设正方形的边长为a,则正方形的面积为a x a。把边长a看做一个变量,我们就可以根据a的值计算正方形的面积,比如:
若a=2,则面积为a x a = 2 x 2 = 4;
若a=3.5,则面积为a x a = 3.5 x 3.5 = 12.25。
在计算机程序中,变量不仅可以为整数或浮点数,还可以是字符串,因此,name作为一个变量就是一个字符串。
要打印出name变量的内容,除了直接写name然后按回车外,还可以用print语句:

>>> print name
Michael

有了输入和输出,我们就可以把上次打印’hello, world‘的程序改成有点意义的程序了:

name = raw_input()
print 'hello,', name

运行上面的程序,第一行代码会让用户输入任意字符作为自己的名字,然后存入name变量中;第二行代码会根据用户的名字向用户说hello,比如输入Michael

C:\Workspace> python hello.py
Michael
hello, Michael

但是程序运行的时候,没有任何提示信息告诉用户:“嘿,赶紧输入你的名字”,这样显得很不友好。幸好,raw_input可以让你显示一个字符串来提示用户,于是我们把代码改成:

name = raw_input('please enter your name: ')
print 'hello,', name

再次运行这个程序,你会发现,程序一运行,会首先打印出please enter your name:,这样,用户就可以根据提示,输入名字后,得到hello, xxx的输出:

C:\Workspace> python hello.py
please enter your name: Michael
hello, Michael

每次运行该程序,根据用户输入的不同,输出结果也会不同。
在命令行下,输入和输出就是这么简单。

小结

任何计算机程序都是为了执行一个特定的任务,有了输入,用户才能告诉计算机程序所需的信息,有了输出,程序运行后才能告诉用户任务的结果。
输入是Input,输出是Output,因此,我们把输入输出统称为Input/Output,或者简写为IO。
raw_input和print是在命令行下面最基本的输入和输出,但是,用户也可以通过其他更高级的图形界面完成输入和输出,比如,在网页上的一个文本框输入自己的名字,点击“确定”后在网页上看到输出信息。

12/20
2015

取一个listtuple的部分元素是非常常见的操作。比如,一个list如下:

>>> L = ['Michael', 'Sarah', 'Tracy', 'Bob', 'Jack']

取前3个元素,应该怎么做?
笨办法:

>>> [L[0], L[1], L[2]]
['Michael', 'Sarah', 'Tracy']

之所以是笨办法是因为扩展一下,取前N个元素就没辙了。
取前N个元素,也就是索引为0-(N-1)的元素,可以用循环:

>>> r = []
>>> n = 3
>>> for i in range(n):
...     r.append(L[i])
... 
>>> r
['Michael', 'Sarah', 'Tracy']

对这种经常取指定索引范围的操作,用循环十分繁琐,因此,Python提供了切片(Slice)操作符,能大大简化这种操作。
对应上面的问题,取前3个元素,用一行代码就可以完成切片:

>>> L[0:3]
['Michael', 'Sarah', 'Tracy']

L[0:3]表示,从索引0开始取,直到索引3为止,但不包括索引3。即索引012,正好是3个元素。
如果第一个索引是0,还可以省略:

>>> L[:3]
['Michael', 'Sarah', 'Tracy']

也可以从索引1开始,取出2个元素出来:

>>> L[1:3]
['Sarah', 'Tracy']

类似的,既然Python支持L[-1]取倒数第一个元素,那么它同样支持倒数切片,试试:

>>> L[-2:]
['Bob', 'Jack']
>>> L[-2:-1]
['Bob']

记住倒数第一个元素的索引是-1
切片操作十分有用。我们先创建一个0-99的数列:

>>> L = range(100)
>>> L
[0, 1, 2, 3, ..., 99]

可以通过切片轻松取出某一段数列。比如前10个数:

>>> L[:10]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

后10个数:

>>> L[-10:]
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

前11-20个数:

>>> L[10:20]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

前10个数,每两个取一个:

>>> L[:10:2]
[0, 2, 4, 6, 8]

所有数,每5个取一个:

>>> L[::5]
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]

甚至什么都不写,只写[:]就可以原样复制一个list

>>> L[:]
[0, 1, 2, 3, ..., 99]

tuple也是一种list,唯一区别是tuple不可变。因此,tuple也可以用切片操作,只是操作的结果仍是tuple:

>>> (0, 1, 2, 3, 4, 5)[:3]
(0, 1, 2)

字符串’xxx’或Unicode字符串u’xxx’也可以看成是一种list,每个元素就是一个字符。因此,字符串也可以用切片操作,只是操作结果仍是字符串:

>>> 'ABCDEFG'[:3]
'ABC'
>>> 'ABCDEFG'[::2]
'ACEG'

在很多编程语言中,针对字符串提供了很多各种截取函数,其实目的就是对字符串切片。Python没有针对字符串的截取函数,只需要切片一个操作就可以完成,非常简单。

小结

有了切片操作,很多地方循环就不再需要了。Python的切片非常灵活,一行代码就可以实现很多行循环才能完成的操作。

12/20
2015

sublimetext
notepad++
请注意,用哪个都行,但是绝对不能用Word和Windows自带的记事本。Word保存的不是纯文本文件,而记事本会自作聪明地在文件开始的地方加上几个特殊字符(UTF-8 BOM),结果会导致程序运行出现莫名其妙的错误。

小结

用文本编辑器写Python程序,然后保存为后缀为.py的文件,就可以用Python直接运行这个程序了。

Python的交互模式和直接运行.py文件有什么区别呢?

直接输入python进入交互模式,相当于启动了Python解释器,但是等待你一行一行地输入源代码,每输入一行就执行一行。
直接运行.py文件相当于启动了Python解释器,然后一次性把.py文件的源代码给执行了,你是没有机会输入源代码的。
用Python开发程序,完全可以一边在文本编辑器里写代码,一边开一个交互式命令窗口,在写代码的过程中,把部分代码粘到命令行去验证,事半功倍!

12/20
2015

list

Python内置的一种数据类型是列表:listlist是一种有序的集合,可以随时添加和删除其中的元素。
java中的List只是包中的类而已,不是内置的数据类型
比如,列出班里所有同学的名字,就可以用一个list表示:

>>> classmates = ['Michael', 'Bob', 'Tracy']
>>> classmates
['Michael', 'Bob', 'Tracy']

list是内置类型,而且使用了魔法糖,如此便捷的写法确实是很好

变量classmates就是一个list。用len()函数可以获得list元素的个数:

>>> len(classmates)
3

用索引来访问list中每一个位置的元素,记得索引是从0开始的:

>>> classmates[0]
'Michael'
>>> classmates[1]
'Bob'
>>> classmates[2]
'Tracy'
>>> classmates[3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

当索引超出了范围时,Python会报一个IndexError错误,所以,要确保索引不要越界,记得最后一个元素的索引是len(classmates) - 1
python中list是内置类型就是不一样,可以直接使用[]
如果要取最后一个元素,除了计算索引位置外,还可以用-1做索引,直接获取最后一个元素:

>>> classmates[-1]
'Tracy'

以此类推,可以获取倒数第2个、倒数第3个:

>>> classmates[-2]
'Bob'
>>> classmates[-3]
'Michael'
>>> classmates[-4]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

当然,倒数第4个就越界了。
list是一个可变的有序表,所以,可以往list中追加元素到末尾:

>>> classmates.append('Adam')
>>> classmates
['Michael', 'Bob', 'Tracy', 'Adam']

也可以把元素插入到指定的位置,比如索引号为1的位置:

>>> classmates.insert(1, 'Jack')
>>> classmates
['Michael', 'Jack', 'Bob', 'Tracy', 'Adam']

要删除list末尾的元素,用pop()方法:

>>> classmates.pop()
'Adam'
>>> classmates
['Michael', 'Jack', 'Bob', 'Tracy']

要删除指定位置的元素,用pop(i)方法,其中i是索引位置:

>>> classmates.pop(1)
'Jack'
>>> classmates
['Michael', 'Bob', 'Tracy']

要把某个元素替换成别的元素,可以直接赋值给对应的索引位置:

>>> classmates[1] = 'Sarah'
>>> classmates
['Michael', 'Sarah', 'Tracy']

list里面的元素的数据类型也可以不同,比如:

>>> L = ['Apple', 123, True]

list元素也可以是另一个list,比如:

>>> s = ['python', 'java', ['asp', 'php'], 'scheme']
>>> len(s)
4

要注意s只有4个元素,其中s[2]又是一个list,如果拆开写就更容易理解了:

>>> p = ['asp', 'php']
>>> s = ['python', 'java', p, 'scheme']

要拿到’php‘可以写p[1]或者s[2][1],因此s可以看成是一个二维数组,类似的还有三维、四维……数组,不过很少用到。
如果一个list中一个元素也没有,就是一个空的list,它的长度为0

>>> L = []
>>> len(L)
0

tuple

另一种有序列表叫元组:tupletuple和list非常类似,但是tuple一旦初始化就不能修改,比如同样是列出同学的名字:

>>> classmates = ('Michael', 'Bob', 'Tracy')

现在,classmates这个tuple不能变了,它也没有append()insert()这样的方法。其他获取元素的方法和list是一样的,你可以正常地使用classmates[0]classmates[-1],但不能赋值成另外的元素。
不可变的tuple有什么意义?因为tuple不可变,所以代码更安全。如果可能,能用tuple代替list就尽量用tuple。
tuple的陷阱:当你定义一个tuple时,在定义的时候,tuple的元素就必须被确定下来,比如:

>>> t = (1, 2)
>>> t
(1, 2)

如果要定义一个空的tuple,可以写成()

>>> t = ()
>>> t
()

但是,要定义一个只有1个元素的tuple,如果你这么定义:

>>> t = (1)
>>> t
1

定义的不是tuple,是1这个数!这是因为括号()既可以表示tuple,又可以表示数学公式中的小括号,这就产生了歧义,因此,Python规定,这种情况下,按小括号进行计算,计算结果自然是1。
所以,只有1个元素的tuple定义时必须加一个逗号,,来消除歧义:

>>> t = (1,)
>>> t
(1,)

Python在显示只有1个元素的tuple时,也会加一个逗号,,以免你误解成数学计算意义上的括号。

12/20
2015

dict

Python内置了字典:dict的支持,dict全称dictionary,在其他语言中也称为map,使用键-值(key-value)存储,具有极快的查找速度。

>>> d = {'Michael': 95, 'Bob': 75, 'Tracy': 85}
>>> d['Michael']
95

为什么dict查找速度这么快?因为dict的实现原理和查字典是一样的。假设字典包含了1万个汉字,我们要查某一个字,一个办法是把字典从第一页往后翻,直到找到我们想要的字为止,这种方法就是在list中查找元素的方法,list越大,查找越慢。
第二种方法是先在字典的索引表里(比如部首表)查这个字对应的页码,然后直接翻到该页,找到这个字,无论找哪个字,这种查找速度都非常快,不会随着字典大小的增加而变慢。
dict就是第二种实现方式,给定一个名字,比如’Michael’,dict在内部就可以直接计算出Michael对应的存放成绩的“页码”,也就是95这个数字存放的内存地址,直接取出来,所以速度非常快。
你可以猜到,这种key-value存储方式,在放进去的时候,必须根据key算出value的存放位置,这样,取的时候才能根据key直接拿到value。
把数据放入dict的方法,除了初始化时指定外,还可以通过key放入:

>>> d['Adam'] = 67
>>> d['Adam']
67

由于一个key只能对应一个value,所以,多次对一个key放入value,后面的值会把前面的值冲掉:

>>> d['Jack'] = 90
>>> d['Jack']
90
>>> d['Jack'] = 88
>>> d['Jack']
88

如果key不存在,dict就会报错:

>>> d['Thomas']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Thomas'

要避免key不存在的错误,有两种办法,一是通过in判断key是否存在:

>>> 'Thomas' in d
False

二是通过dict提供的get方法,如果key不存在,可以返回None,或者自己指定的value

>>> d.get('Thomas')
>>> d.get('Thomas', -1)
-1

注意:返回None的时候Python的交互式命令行不显示结果。
要删除一个key,用pop(key)方法,对应的value也会从dict中删除:

>>> d.pop('Bob')
75
>>> d
{'Michael': 95, 'Tracy': 85}

请务必注意,dict内部存放的顺序和key放入的顺序是没有关系的。
list比较,dict有以下几个特点:
查找和插入的速度极快,不会随着key的增加而增加;
需要占用大量的内存,内存浪费多。
而list相反:
查找和插入的时间随着元素的增加而增加;
占用空间小,浪费内存很少。
所以,dict是用空间来换取时间的一种方法。
dict可以用在需要高速查找的很多地方,在Python代码中几乎无处不在,正确使用dict非常重要,需要牢记的第一条就是dict的key必须是不可变对象。
这是因为dict根据key来计算value的存储位置,如果每次计算相同的key得出的结果不同,那dict内部就完全混乱了。这个通过key计算位置的算法称为哈希算法(Hash)。
要保证hash的正确性,作为key的对象就不能变。在Python中,字符串、整数等都是不可变的,因此,可以放心地作为key。而list是可变的,就不能作为key:

>>> key = [1, 2, 3]
>>> d[key] = 'a list'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

set

setdict类似,也是一组key的集合,但不存储value。由于key不能重复,所以,在set中,没有重复的key。
要创建一个set,需要提供一个list作为输入集合:

>>> s = set([1, 2, 3])
>>> s
set([1, 2, 3])

注意,传入的参数[1, 2, 3]是一个list,而显示的set([1, 2, 3])只是告诉你这个set内部有1,2,3这3个元素,显示的[]不表示这是一个list。
重复元素在set中自动被过滤:

>>> s = set([1, 1, 2, 2, 3, 3])
>>> s
set([1, 2, 3])

通过add(key)方法可以添加元素到set中,可以重复添加,但不会有效果:

>>> s.add(4)
>>> s
set([1, 2, 3, 4])
>>> s.add(4)
>>> s
set([1, 2, 3, 4])

通过remove(key)方法可以删除元素:

>>> s.remove(4)
>>> s
set([1, 2, 3])

set可以看成数学意义上的无序和无重复元素的集合,因此,两个set可以做数学意义上的交集、并集等操作:

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s1 & s2
set([2, 3])
>>> s1 | s2
set([1, 2, 3, 4])

setdict的唯一区别仅在于没有存储对应的value,但是,set的原理和dict一样,所以,同样不可以放入可变对象,因为无法判断两个可变对象是否相等,也就无法保证set内部“不会有重复元素”。试试把list放入set,看看是否会报错。

set搞得这么复杂,不知道是为什么,一个set必须要使用一个list来初始化,真是失败的想法

12/20
2015

❝ Our imagination(n. [心理] 想象力;空想;幻想物) is stretched(v. 伸直,伸展;舒展) to the utmost(n. 极限;最大可能), not, as in fiction, to imagine things which are not really there, but just to comprehend(vt. 理解;包含;由…组成) those things which are. ❞
— Richard Feynman

Diving In

Every programming language has that one feature, a complicated thing intentionally(adv. 故意地,有意地) made simple. If you’re coming from another language, you could easily miss it, because your old language didn’t make that thing simple (because it was busy making something else simple instead). This chapter will teach you about list comprehensions, dictionary comprehensions, and set comprehensions: three related concepts centered around one very powerful technique. But first, I want to take a little detour(vt. 使…绕道而行) into two modules that will help you navigate your local file system.

Working With Files And Directories

Python 3 comes with a module called os, which stands for “operating system.” The os module contains a plethora(n. 过多;过剩;[医] 多血症) of functions to get information on — and in some cases, to manipulate — local directories, files, processes, and environment variables. Python does its best to offer a unified API across all supported operating systems so your programs can run on any computer with as little platform-specific code as possible.

The Current Working Directory

When you’re just getting started with Python, you’re going to spend a lot of time in the Python Shell. Throughout this book, you will see examples that go like this:

  • Import one of the modules in the examples folder
  • Call a function in that module
  • Explain the result

If you don’t know about the current working directory, step 1 will probably fail with an ImportError. Why? Because Python will look for the example module in the import search path, but it won’t find it because the examples folder isn’t one of the directories in the search path. To get past this, you can do one of two things:

  • Add the examples folder to the import search path
  • Change the current working directory to the examples folder

The current working directory is an invisible property that Python holds in memory at all times. There is always a current working directory, whether you’re in the Python Shell, running your own Python script from the command line, or running a Python CGI script on a web server somewhere.

The os module contains two functions to deal with the current working directory.

12/20
2015
12/20
2015

Diving In

Iterators are the “secret sauce(n. 酱油;沙司;调味汁)” of Python 3. They’re everywhere, underlying everything, always just out of sight. Comprehensions are just a simple form of iterators. Generators are just a simple form of iterators. A function that yields values is a nice, compact way of building an iterator without building an iterator. Let me show you what I mean by that.
Remember the Fibonacci generator? Here it is as a built-from-scratch iterator:

class Fib:
    '''iterator that yields numbers in the Fibonacci sequence'''

    def __init__(self, max):
	self.max = max

    def __iter__(self):
	self.a = 0
	self.b = 1
	return self

    def __next__(self):
	fib = self.a
	if fib > self.max:
	    raise StopIteration
	self.a, self.b = self.b, self.a + self.b
	return fib
12/06
2015

在交互式环境的提示符>>>下,直接输入代码,按回车,就可以立刻得到代码执行结果。现在,试试输入100+200,看看计算结果是不是300

>>> 100+200
300

如果要让Python打印出指定的文字,可以用print语句,然后把希望打印的文字用单引号或者双引号括起来,但不能混用单引号和双引号:

>>> print 'hello, world'
hello, world

最后,用exit()退出Python,我们的第一个Python程序完成!

小结

在Python交互式命令行下,可以直接输入代码,然后执行,并立刻得到结果。

12/06
2015

Python是一种计算机程序设计语言。
用任何编程语言来开发程序,都是为了让计算机干活,比如下载一个MP3,编写一个文档等等,而计算机干活的CPU只认识机器指令,所以,尽管不同的编程语言差异极大,最后都得“翻译”成CPU可以执行的机器指令。而不同的编程语言,干同一个活,编写的代码量,差距也很大。
比如,完成同一个任务,C语言要写1000行代码,Java只需要写100行,而Python可能只要20行。
所以Python是一种相当高级的语言。
代码少的代价是运行速度慢,C程序运行1秒钟,Java程序可能需要2秒,而Python程序可能就需要10秒。
那是不是越低级的程序越难学,越高级的程序越简单?表面上来说,是的,但是,在非常高的抽象计算中,高级的Python程序设计也是非常难学的,所以,高级程序语言不等于简单。
但是,对于初学者和完成普通任务,Python语言是非常简单易用的。连Google都在大规模使用Python,你就不用担心学了会没用。
用Python可以做什么?可以做日常任务,比如自动备份你的MP3;可以做网站,很多著名的网站包括YouTube就是Python写的;可以做网络游戏的后台,很多在线游戏的后台都是Python开发的。总之就是能干很多很多事啦。
Python当然也有不能干的事情,比如写操作系统,这个只能用C语言写;写手机应用,只能用Objective-C(针对iPhone)和Java(针对Android);写3D游戏,最好用C或C++。

12/06
2015

当我们编写Python代码时,我们得到的是一个包含Python代码的以.py为扩展名的文本文件。要运行代码,就需要Python解释器去执行.py文件。
由于整个Python语言从规范到解释器都是开源的,所以理论上,只要水平够高,任何人都可以编写Python解释器来执行Python代码(当然难度很大)。事实上,确实存在多种Python解释器。

CPython

当我们从Python官方网站下载并安装好Python 2.7后,我们就直接获得了一个官方版本的解释器:CPython。这个解释器是用C语言开发的,所以叫CPython。在命令行下运行python就是启动CPython解释器。
CPython是使用最广的Python解释器。教程的所有代码也都在CPython下执行。

IPython

IPython是基于CPython之上的一个交互式解释器,也就是说,IPython只是在交互方式上有所增强,但是执行Python代码的功能和CPython是完全一样的。好比很多国产浏览器虽然外观不同,但内核其实都是调用了IE。
CPython用>>>作为提示符,而IPython用In [序号]:作为提示符。

PyPy

PyPy是另一个Python解释器,它的目标是执行速度。PyPy采用JIT技术,对Python代码进行动态编译(注意不是解释),所以可以显著提高Python代码的执行速度。
绝大部分Python代码都可以在PyPy下运行,但是PyPy和CPython有一些是不同的,这就导致相同的Python代码在两种解释器下执行可能会有不同的结果。如果你的代码要放到PyPy下执行,就需要了解PyPy和CPython的不同点。

Jython

Jython是运行在Java平台上的Python解释器,可以直接把Python代码编译成Java字节码执行。

IronPython

IronPython和Jython类似,只不过IronPython是运行在微软.Net平台上的Python解释器,可以直接把Python代码编译成.Net的字节码。

小结

Python的解释器很多,但使用最广泛的还是CPython。如果要和Java或.Net平台交互,最好的办法不是用Jython或IronPython,而是通过网络调用来交互,确保各程序之间的独立性。

12/06
2015

Python是著名的“龟叔”Guido van Rossum在1989年圣诞节期间,为了打发无聊的圣诞节而编写的一个编程语言。
Python是用来编写应用程序的高级编程语言。
当你用一种语言开始作真正的软件开发时,你除了编写代码外,还需要很多基本的已经写好的现成的东西,来帮助你加快开发进度。比如说,要编写一个电子邮件客户端,如果先从最底层开始编写网络协议相关的代码,那估计一年半载也开发不出来。高级编程语言通常都会提供一个比较完善的基础代码库,让你能直接调用,比如,针对电子邮件协议的SMTP库,针对桌面环境的GUI库,在这些已有的代码库的基础上开发,一个电子邮件客户端几天就能开发出来。
Python就为我们提供了非常完善的基础代码库,覆盖了网络、文件、GUI、数据库、文本等大量内容,被形象地称作“内置电池(batteries included)”。用Python开发,许多功能不必从零编写,直接使用现成的即可。
除了内置的库外,Python还有大量的第三方库,也就是别人开发的,供你直接使用的东西。当然,如果你开发的代码通过很好的封装,也可以作为第三方库给别人使用。
许多大型网站就是用Python开发的,例如YouTube、Instagram,还有国内的豆瓣。很多大公司,包括Google、Yahoo等,甚至NASA(美国航空航天局)都大量地使用Python。
龟叔给Python的定位是“优雅”、“明确”、“简单”,所以Python程序看上去总是简单易懂,初学者学Python,不但入门容易,而且将来深入下去,可以编写那些非常非常复杂的程序。
总的来说,Python的哲学就是简单优雅,尽量写容易看明白的代码,尽量写少的代码。如果一个资深程序员向你炫耀他写的晦涩难懂、动不动就几万行的代码,你可以尽情地嘲笑他。

那Python适合开发哪些类型的应用呢?

首选是网络应用,包括网站、后台服务等等;
其次是许多日常需要的小工具,包括系统管理员需要的脚本任务等等;
另外就是把其他语言开发的程序再包装起来,方便使用。

最后说说Python的缺点。

第一个缺点就是运行速度慢,和C程序相比非常慢,因为Python是解释型语言,你的代码在执行时会一行一行地翻译成CPU能理解的机器码,这个翻译过程非常耗时,所以很慢。而C程序是运行前直接编译成CPU能执行的机器码,所以非常快。
但是大量的应用程序不需要这么快的运行速度,因为用户根本感觉不出来。例如开发一个下载MP3的网络应用程序,C程序的运行时间需要0.001秒,而Python程序的运行时间需要0.1秒,慢了100倍,但由于网络更慢,需要等待1秒,你想,用户能感觉到1.001秒和1.1秒的区别吗?这就好比F1赛车和普通的出租车在北京三环路上行驶的道理一样,虽然F1赛车理论时速高达400公里,但由于三环路堵车的时速只有20公里,因此,作为乘客,你感觉的时速永远是20公里。
第二个缺点就是代码不能加密。如果要发布你的Python程序,实际上就是发布源代码,这一点跟C语言不同,C语言不用发布源代码,只需要把编译后的机器码(也就是你在Windows上常见的xxx.exe文件)发布出去。要从机器码反推出C代码是不可能的,所以,凡是编译型的语言,都没有这个问题,而解释型的语言,则必须把源码发布出去。
这个缺点仅限于你要编写的软件需要卖给别人挣钱的时候。好消息是目前的互联网时代,靠卖软件授权的商业模式越来越少了,靠网站和移动应用卖服务的模式越来越多了,后一种模式不需要把源码给别人。
再说了,现在如火如荼的开源运动和互联网自由开放的精神是一致的,互联网上有无数非常优秀的像Linux一样的开源代码,我们千万不要高估自己写的代码真的有非常大的“商业价值”。那些大公司的代码不愿意开放的更重要的原因是代码写得太烂了,一旦开源,就没人敢用他们的产品了。