成為初級資料分析師 | Python 與資料科學應用

Python 程式設計常用技巧

郭耀仁

Readability counts.

The Zen of Python, Tim Peters

大綱

  • 自訂函數
  • 錯誤與例外
  • 彈性參數
  • 匿名函數
  • 迭代函數(Iterators)
  • List Comprehensions
  • Generators
  • 常用文字方法

自訂函數

自訂函數 Code Block 結構

def function_name(輸入, 參數, ...):
    """
    Docstrings
    """
    # 做些什麼事
    return 輸出
In [1]:
# Define
def get_abs(x):
    """
    取得 x 的絕對值。
    """
    if x < 0:
        return -x
    else:
        return x
In [2]:
# Use
print(help(get_abs))
print(get_abs(-5556))
print(get_abs(5566))
Help on function get_abs in module __main__:

get_abs(x)
    取得 x 的絕對值。

None
5556
5566

錯誤與例外

常見的 Runtime error: ZeroDivisionError

In [3]:
def divide(x, y):
    """
    將輸入的兩個數字相除
    """
    return x / y

print(divide(5566, 0))
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-3-400a2a8f876f> in <module>
      5     return x / y
      6 
----> 7 print(divide(5566, 0))

<ipython-input-3-400a2a8f876f> in divide(x, y)
      3     將輸入的兩個數字相除
      4     """
----> 5     return x / y
      6 
      7 print(divide(5566, 0))

ZeroDivisionError: division by zero

使用 try...except... 處理錯誤與例外

In [4]:
def safe_divide(x, y):
    """
    將輸入的兩個數字相除
    """
    try:
        return x / y
    except:
        return "Something went wrong..."

print(safe_divide(5566, 0))
Something went wrong...

彈性參數

有時我們的函數不確定使用者會想輸入幾個參數

*args : for list-like arguments

In [5]:
def get_fahrenheit(c):
    return c*9/5 + 32

get_fahrenheit(18)
Out[5]:
64.4
In [6]:
get_fahrenheit(18, 20, 22)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-07d7bb62e8ca> in <module>
----> 1 get_fahrenheit(18, 20, 22)

TypeError: get_fahrenheit() takes 1 positional argument but 3 were given
In [7]:
def get_fahrenheits(*args):
    fahrenheits = []
    for c in args:
        fahrenheits.append(c*9/5 + 32)
    return fahrenheits

print(get_fahrenheits(18))
print(get_fahrenheits(18, 20, 22))
[64.4]
[64.4, 68.0, 71.6]

寫一個函數 get_mean(*args) 回傳 *args 所組成之數列的平均數

$$\bar{x} = \frac{\sum_i^n x_i}{n}$$
In [10]:
def get_mean(*args):
    summation = sum(args)
    length = len(args)
    x_bar = summation / length
    return x_bar
In [11]:
print(get_mean(1, 3, 5, 7, 9))
print(get_mean(3, 4, 5, 6, 7))
print(get_mean(3))
5.0
5.0
3.0

隨堂練習:寫一個函數 get_std(*args) 回傳 *args 所組成之數列的樣本標準差

$$s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}$$

https://en.wikipedia.org/wiki/Standard_deviation

In [13]:
print(get_std(1, 3, 5, 7, 9))
print(get_std(3, 4, 5, 6, 7))
print(get_std(3))
3.1622776601683795
1.5811388300841898
Please input at least 2 numbers.

匿名函數

有些時候我們需要比 def 更簡潔的語法來定義函數

In [14]:
def squared(x):
    return x**2

squared(2)
Out[14]:
4

匿名函數又稱為 lambda 函數

FUNCTION_NAME = lambda arg0, arg1, ...: USING arg0, arg1
In [15]:
squared = lambda x: x**2

squared(2)
Out[15]:
4
In [16]:
my_abs = lambda x: -x if x < 0 else x

print(my_abs(-2))
print(my_abs(2))
2
2

使用迭代函數(Iterators)時候會產生匿名函數需求

迭代函數(Iterators)

常與匿名函數一起出現的迭代函數

  • map()
  • filter()
In [17]:
def get_fahrenheit(c):
    return c*9/5 + 32

temp_c = [18, 20, 22]
temp_f = map(get_fahrenheit, temp_c)
list(temp_f)
Out[17]:
[64.4, 68.0, 71.6]
In [18]:
# map()
temp_c = [18, 20, 22]
temp_f = map(lambda x: x*9/5 + 32, temp_c)
list(temp_f)
Out[18]:
[64.4, 68.0, 71.6]
In [19]:
# filter()
temp_c = [-10, 18, 20, -5, -3]
below_zero = filter(lambda x: x < 0, temp_c)
list(below_zero)
Out[19]:
[-10, -5, -3]

其他常用迭代函數

  • enumerate():同時取用一個 iterable 中的 index 與 value
  • zip():同時取用多個 iterables 中的 values
In [20]:
# enumerate():同時取用一個 iterable 中的 index 與 value
the_avenger_movies = ["The Avengers", "Avengers: Age of Ultron", "Avengers: Infinity War", "Avengers: Endgame"]
for i, val in enumerate(the_avenger_movies):
    print("復仇者聯盟第{}集:{}".format(i+1, val))
復仇者聯盟第1集:The Avengers
復仇者聯盟第2集:Avengers: Age of Ultron
復仇者聯盟第3集:Avengers: Infinity War
復仇者聯盟第4集:Avengers: Endgame
In [21]:
# zip():同時取用多個 iterables 中的 values
release_years = [2012, 2015, 2018, 2019]
the_avenger_movies = ["The Avengers", "Avengers: Age of Ultron", "Avengers: Infinity War", "Avengers: Endgame"]
for y, movie in zip(release_years, the_avenger_movies):
    print("{} 上映年份 {}".format(movie, y))
The Avengers 上映年份 2012
Avengers: Age of Ultron 上映年份 2015
Avengers: Infinity War 上映年份 2018
Avengers: Endgame 上映年份 2019

List Comprehensions

將使用 loop 構建 list 壓縮為簡潔單行的方法

In [22]:
# loop construction
squared_list = []
for i in range(10):
    squared_list.append(i**2)
print(squared_list)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
In [23]:
# list comprehension
squared_list = [i**2 for i in range(10)]
print(squared_list)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
In [24]:
# list comprehension with if
even_numbers = [i for i in range(10) if i % 2 == 0]
print(even_numbers)
[0, 2, 4, 6, 8]
In [25]:
# list comprehension with if-else
is_even_numbers = [True if i % 2 == 0 else False for i in range(10)]
print(is_even_numbers)
[True, False, True, False, True, False, True, False, True, False]

隨堂練習:將公里的距離都轉換成英里

$$1 \text{ kilometer} = 0.62137 \text{ mile}$$
In [26]:
kilometers = [1.6, 3, 5, 10, 21.095, 42.195]
In [28]:
print(miles)
[0.994192, 1.86411, 3.1068499999999997, 6.213699999999999, 13.10780015, 26.21870715]

Generators

Generators 是用來產生資料的物件

常見的 generators

  • map()
  • filter()
  • enumerate()
  • zip()
In [29]:
# map()
temp_c = [18, 20, 22]
temp_f = map(lambda x: x*9/5 + 32, temp_c)
print(type(temp_f))
print(temp_f)
<class 'map'>
<map object at 0x10aaa3610>
In [30]:
list(temp_f)
Out[30]:
[64.4, 68.0, 71.6]
In [31]:
list(temp_f)
Out[31]:
[]
In [32]:
# filter()
temp_c = [-10, 18, 20, -5, -3]
below_zero = filter(lambda x: x < 0, temp_c)
print(type(below_zero))
print(below_zero)
<class 'filter'>
<filter object at 0x10aaa1950>
In [33]:
list(below_zero)
Out[33]:
[-10, -5, -3]
In [34]:
list(below_zero)
Out[34]:
[]

隨堂練習:使用 map() 將公里的距離都轉換成英里

$$1 \text{ kilometer} = 0.62137 \text{ mile}$$
In [35]:
kilometers = [1.6, 3, 5, 10, 21.095, 42.195]
In [37]:
print(miles)
print(list(miles))
<generator object <genexpr> at 0x10aac90d0>
[0.994192, 1.86411, 3.1068499999999997, 6.213699999999999, 13.10780015, 26.21870715]
In [38]:
# enumerate()
the_avenger_movies = ["The Avengers", "Avengers: Age of Ultron", "Avengers: Infinity War", "Avengers: Endgame"]
enumerate_generator = enumerate(the_avenger_movies)
print(type(enumerate_generator))
print(enumerate_generator)
<class 'enumerate'>
<enumerate object at 0x10aac1a00>
In [39]:
list(enumerate_generator)
Out[39]:
[(0, 'The Avengers'),
 (1, 'Avengers: Age of Ultron'),
 (2, 'Avengers: Infinity War'),
 (3, 'Avengers: Endgame')]
In [40]:
list(enumerate_generator)
Out[40]:
[]
In [41]:
# zip()
release_years = [2012, 2015, 2018, 2019]
the_avenger_movies = ["The Avengers", "Avengers: Age of Ultron", "Avengers: Infinity War", "Avengers: Endgame"]
zip_generator = zip(release_years, the_avenger_movies)
print(type(zip_generator))
print(zip_generator)
<class 'zip'>
<zip object at 0x10aadc3c0>
In [42]:
list(zip_generator)
Out[42]:
[(2012, 'The Avengers'),
 (2015, 'Avengers: Age of Ultron'),
 (2018, 'Avengers: Infinity War'),
 (2019, 'Avengers: Endgame')]
In [43]:
list(zip_generator)
Out[43]:
[]

常用文字方法

格式化文字

.format()

In [47]:
pi = 3.14159
print("圓周率的值為: {}".format(pi))
圓周率的值為: 3.14159
In [48]:
pi_str = "圓周率"
pi = 3.14159

print("{}取兩位小數為: {:.2f}".format(pi_str, pi))
print("{}整數部分是 {:.0f}".format(pi_str, pi))
圓周率取兩位小數為: 3.14
圓周率整數部分是 3

更改文字大小寫的方法

  • .title()
  • .upper()
  • .lower()
In [49]:
use_the_force = "Luke, use the Force!"

print(use_the_force.title())
print(use_the_force.upper())
print(use_the_force.lower())
Luke, Use The Force!
LUKE, USE THE FORCE!
luke, use the force!

去除多餘空白、換行符號的方法

  • .rstrip()
  • .lstrip()
  • .strip()
In [50]:
use_the_force = """
     
Luke, use the Force!
     
"""

use_the_force
Out[50]:
'\n     \nLuke, use the Force!\n     \n'
In [51]:
print(use_the_force.rstrip())
print(use_the_force.lstrip())
print(use_the_force.strip())
     
Luke, use the Force!
Luke, use the Force!
     

Luke, use the Force!

取代文字的方法

.replace()

In [52]:
skywalker = "Anakin Skywalker"
print(skywalker)
print(skywalker.replace("Anakin", "Luke"))
Anakin Skywalker
Luke Skywalker

切割文字的方法

.split()

In [53]:
use_the_force = "Luke, use the Force!"
print(use_the_force.split())
print(use_the_force.split(","))
['Luke,', 'use', 'the', 'Force!']
['Luke', ' use the Force!']

計算文字頻率的簡單方法

episode_ix_opening_crawl = """
A long time ago in a galaxy far, far away....

The dead speak! The galaxy has heard a mysterious broadcast, a threat of REVENGE in the sinister voice of the late EMPEROR PALPATINE. 

GENERAL LEIA ORGANA dispatches secret agents to gather intelligence, while REY, the last hope of the Jedi, trains for battle against the diabolical FIRST ORDER.

Meanwhile, Supreme Leader KYLO REN rages in search of the phantom Emperor, determined to destroy any threat to his power....
"""
In [54]:
def get_word_frequency(long_str):
    long_str_split = long_str.split()
    word_frequency = {}
    for i in long_str_split:
        if i not in word_frequency.keys():
            word_frequency[i] = 1
        else:
            word_frequency[i] += 1
    return word_frequency
In [55]:
episode_ix_opening_crawl = """
A long time ago in a galaxy far, far away....

The dead speak! The galaxy has heard a mysterious broadcast, a threat of REVENGE in the sinister voice of the late EMPEROR PALPATINE. 

GENERAL LEIA ORGANA dispatches secret agents to gather intelligence, while REY, the last hope of the Jedi, trains for battle against the diabolical FIRST ORDER.

Meanwhile, Supreme Leader KYLO REN rages in search of the phantom Emperor, determined to destroy any threat to his power....
"""
In [56]:
print(get_word_frequency(episode_ix_opening_crawl))
{'A': 1, 'long': 1, 'time': 1, 'ago': 1, 'in': 3, 'a': 3, 'galaxy': 2, 'far,': 1, 'far': 1, 'away....': 1, 'The': 2, 'dead': 1, 'speak!': 1, 'has': 1, 'heard': 1, 'mysterious': 1, 'broadcast,': 1, 'threat': 2, 'of': 4, 'REVENGE': 1, 'the': 6, 'sinister': 1, 'voice': 1, 'late': 1, 'EMPEROR': 1, 'PALPATINE.': 1, 'GENERAL': 1, 'LEIA': 1, 'ORGANA': 1, 'dispatches': 1, 'secret': 1, 'agents': 1, 'to': 3, 'gather': 1, 'intelligence,': 1, 'while': 1, 'REY,': 1, 'last': 1, 'hope': 1, 'Jedi,': 1, 'trains': 1, 'for': 1, 'battle': 1, 'against': 1, 'diabolical': 1, 'FIRST': 1, 'ORDER.': 1, 'Meanwhile,': 1, 'Supreme': 1, 'Leader': 1, 'KYLO': 1, 'REN': 1, 'rages': 1, 'search': 1, 'phantom': 1, 'Emperor,': 1, 'determined': 1, 'destroy': 1, 'any': 1, 'his': 1, 'power....': 1}