beautifulsoup4
¶!pip install -U beautifulsoup4
Collecting beautifulsoup4
Downloading https://files.pythonhosted.org/packages/3b/c8/a55eb6ea11cd7e5ac4bacdf92bac4693b90d3ba79268be16527555e186f0/beautifulsoup4-4.8.1-py3-none-any.whl (101kB)
|████████████████████████████████| 102kB 6.0MB/s
Collecting soupsieve>=1.2
Downloading https://files.pythonhosted.org/packages/81/94/03c0f04471fc245d08d0a99f7946ac228ca98da4fa75796c507f61e688c2/soupsieve-1.9.5-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Found existing installation: beautifulsoup4 4.6.3
Uninstalling beautifulsoup4-4.6.3:
Successfully uninstalled beautifulsoup4-4.6.3
Successfully installed beautifulsoup4-4.8.1 soupsieve-1.9.5
<!DOCTYPE html>
<html>
<head>
</head>
<body>
</body>
</html>
樣式表(Cascading Style Sheets,CSS)是一種用來為 HTML 添加樣式(字型、間距和顏色等)的電腦語言,由 W3C 定義和維護。
Source: https://zh.wikipedia.org/zh-tw/%E5%B1%82%E5%8F%A0%E6%A0%B7%E5%BC%8F%E8%A1%A8
超文本傳輸協定 (HTTP) 是一種用來傳輸超媒體文件 (像是HTML文件) 的應用層協定,被設計來讓瀏覽器和伺服器進行溝通,但也可做其他用途。HTTP 遵循標準客戶端—伺服器模式,由客戶端連線以發送請求,然後等待接收回應。
requests
套件Chrome 開發者工具是一套內建於 Google Chrome 中的 Web 開發和測試工具。
Source: https://developers.google.com/web/tools/chrome-devtools/?hl=zh-TW
使用 Network 頁籤瞭解請求和下載的檔案
快速地開啟、關閉 JavaScript
Source: https://chrome.google.com/webstore/detail/quick-javascript-switcher/geddoclleiomckbhadiaipdggiiccfje
requests
函數¶requests.get()
:進行 GET 請求(下載檔案)、常搭配 Query String Parametersrequests.post()
:進行 POST 請求(上傳資料)、搭配 Form Dataimport requests
request_url = "https://www.imdb.com/"
response = requests.get(request_url)
request_url = "https://mops.twse.com.tw/mops/web/t05st10_ifrs"
response = requests.post(request_url)
response.status_code
:查看狀態碼response.json()
:將回應直接轉換為 Python 的資料結構(list
或 dict
)response.content
:將回應轉換為 bytes
response.text
:將回應轉換為 str
.json()
方法後直接以 Python 資料結構解析.content
屬性後以 lxml
搭配 XPath 解析.text
屬性後以 bs4
搭配 CSS Selector 解析JavaScript Object Notation (JSON) 為將結構化資料 (structured data) 呈現為 JavaScript 物件的標準格式,常用於網站上的資料呈現、傳輸。
Source: mozilla.org
Source: mozilla.org
json
作為剖析的媒介dict
類別list
of dict
requests
請求資料.json()
方法,例如 response.json()
request_url = "http://data.nba.net/prod/v2/2019/teams.json"
response = requests.get(request_url)
teams = response.json()
print(type(teams))
print(teams)
<class 'dict'> {'_internal': {'pubDateTime': '2019-06-26 06:00:23.891 EDT', 'igorPath': 'cron,1561543218800,1561543218800|router,1561543218800,1561543218922|domUpdater,1561543219144,1561543219858|feedProducer,1561543221917,1561543224371', 'xslt': 'NBA/xsl/league/roster/marty_teams_list.xsl', 'xsltForceRecompile': 'true', 'xsltInCache': 'false', 'xsltCompileTimeMillis': '1545', 'xsltTransformTimeMillis': '540', 'consolidatedDomKey': 'qamanual__transform__marty_teams_list__5498140551604', 'endToEndTimeMillis': '5571'}, 'league': {'standard': [{'isNBAFranchise': False, 'isAllStar': False, 'city': 'Croatia', 'altCityName': 'Croatia', 'fullName': 'Team Croatia', 'tricode': 'CRO', 'teamId': '70', 'nickname': 'Croatia', 'urlName': 'croatia', 'teamShortName': 'Croatia', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'China', 'altCityName': 'China', 'fullName': 'Team China', 'tricode': 'CHN', 'teamId': '45', 'nickname': 'China', 'urlName': 'china', 'teamShortName': 'China', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Adelaide', 'altCityName': 'Adelaide', 'fullName': 'Adelaide 36ers', 'tricode': 'ADL', 'teamId': '15019', 'nickname': '36ers', 'urlName': '36ers', 'teamShortName': 'Adelaide', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Atlanta', 'altCityName': 'Atlanta', 'fullName': 'Atlanta Hawks', 'tricode': 'ATL', 'teamId': '1610612737', 'nickname': 'Hawks', 'urlName': 'hawks', 'teamShortName': 'Atlanta', 'confName': 'East', 'divName': 'Southeast'}, {'isNBAFranchise': False, 'isAllStar': True, 'city': 'Away', 'altCityName': 'Away', 'fullName': 'Away Away', 'tricode': 'AWY', 'teamId': '1610616840', 'nickname': 'Away', 'urlName': 'away', 'teamShortName': 'Away', 'confName': 'East', 'divName': 'East'}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Beijing', 'altCityName': 'Beijing', 'fullName': 'Beijing Ducks', 'tricode': 'BJD', 'teamId': '15021', 'nickname': 'Ducks', 'urlName': 'ducks', 'teamShortName': 'Beijing', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Boston', 'altCityName': 'Boston', 'fullName': 'Boston Celtics', 'tricode': 'BOS', 'teamId': '1610612738', 'nickname': 'Celtics', 'urlName': 'celtics', 'teamShortName': 'Boston', 'confName': 'East', 'divName': 'Atlantic'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Brooklyn', 'altCityName': 'Brooklyn', 'fullName': 'Brooklyn Nets', 'tricode': 'BKN', 'teamId': '1610612751', 'nickname': 'Nets', 'urlName': 'nets', 'teamShortName': 'Brooklyn', 'confName': 'East', 'divName': 'Atlantic'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Charlotte', 'altCityName': 'Charlotte', 'fullName': 'Charlotte Hornets', 'tricode': 'CHA', 'teamId': '1610612766', 'nickname': 'Hornets', 'urlName': 'hornets', 'teamShortName': 'Charlotte', 'confName': 'East', 'divName': 'Southeast'}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Buenos Aires', 'altCityName': 'Buenos Aires', 'fullName': 'San Lorenzo de Almagro', 'tricode': 'SLA', 'teamId': '12330', 'nickname': 'San Lorenzo', 'urlName': 'san_lorenzo', 'teamShortName': 'San Lorenzo', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Chicago', 'altCityName': 'Chicago', 'fullName': 'Chicago Bulls', 'tricode': 'CHI', 'teamId': '1610612741', 'nickname': 'Bulls', 'urlName': 'bulls', 'teamShortName': 'Chicago', 'confName': 'East', 'divName': 'Central'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Cleveland', 'altCityName': 'Cleveland', 'fullName': 'Cleveland Cavaliers', 'tricode': 'CLE', 'teamId': '1610612739', 'nickname': 'Cavaliers', 'urlName': 'cavaliers', 'teamShortName': 'Cleveland', 'confName': 'East', 'divName': 'Central'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Dallas', 'altCityName': 'Dallas', 'fullName': 'Dallas Mavericks', 'tricode': 'DAL', 'teamId': '1610612742', 'nickname': 'Mavericks', 'urlName': 'mavericks', 'teamShortName': 'Dallas', 'confName': 'West', 'divName': 'Southwest'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Denver', 'altCityName': 'Denver', 'fullName': 'Denver Nuggets', 'tricode': 'DEN', 'teamId': '1610612743', 'nickname': 'Nuggets', 'urlName': 'nuggets', 'teamShortName': 'Denver', 'confName': 'West', 'divName': 'Northwest'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Detroit', 'altCityName': 'Detroit', 'fullName': 'Detroit Pistons', 'tricode': 'DET', 'teamId': '1610612765', 'nickname': 'Pistons', 'urlName': 'pistons', 'teamShortName': 'Detroit', 'confName': 'East', 'divName': 'Central'}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Franca', 'altCityName': 'Franca', 'fullName': 'SESI/Franca', 'tricode': 'FRA', 'teamId': '12332', 'nickname': 'Franca', 'urlName': 'franca', 'teamShortName': 'Franca', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Golden State', 'altCityName': 'Golden State', 'fullName': 'Golden State Warriors', 'tricode': 'GSW', 'teamId': '1610612744', 'nickname': 'Warriors', 'urlName': 'warriors', 'teamShortName': 'Golden State', 'confName': 'West', 'divName': 'Pacific'}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Guangzhou', 'altCityName': 'Guangzhou', 'fullName': 'Guangzhou Long-Lions', 'tricode': 'GUA', 'teamId': '15018', 'nickname': 'Long-Lions', 'urlName': 'long-lions', 'teamShortName': 'Guangzhou', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Haifa', 'altCityName': 'Haifa', 'fullName': 'Haifa Maccabi Haifa', 'tricode': 'MAC', 'teamId': '93', 'nickname': 'Maccabi Haifa', 'urlName': 'maccabi_haifa', 'teamShortName': 'Maccabi Haifa', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': False, 'isAllStar': True, 'city': 'Home', 'altCityName': 'Home', 'fullName': 'Home Home', 'tricode': 'HME', 'teamId': '1610616839', 'nickname': 'Home', 'urlName': 'home', 'teamShortName': 'Home', 'confName': 'East', 'divName': 'East'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Houston', 'altCityName': 'Houston', 'fullName': 'Houston Rockets', 'tricode': 'HOU', 'teamId': '1610612745', 'nickname': 'Rockets', 'urlName': 'rockets', 'teamShortName': 'Houston', 'confName': 'West', 'divName': 'Southwest'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Indiana', 'altCityName': 'Indiana', 'fullName': 'Indiana Pacers', 'tricode': 'IND', 'teamId': '1610612754', 'nickname': 'Pacers', 'urlName': 'pacers', 'teamShortName': 'Indiana', 'confName': 'East', 'divName': 'Central'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'LA', 'altCityName': 'LA Clippers', 'fullName': 'LA Clippers', 'tricode': 'LAC', 'teamId': '1610612746', 'nickname': 'Clippers', 'urlName': 'clippers', 'teamShortName': 'LA Clippers', 'confName': 'West', 'divName': 'Pacific'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Los Angeles', 'altCityName': 'Los Angeles Lakers', 'fullName': 'Los Angeles Lakers', 'tricode': 'LAL', 'teamId': '1610612747', 'nickname': 'Lakers', 'urlName': 'lakers', 'teamShortName': 'L.A. Lakers', 'confName': 'West', 'divName': 'Pacific'}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Melbourne', 'altCityName': 'Melbourne', 'fullName': 'Melbourne United', 'tricode': 'MEL', 'teamId': '15016', 'nickname': 'United', 'urlName': 'united', 'teamShortName': 'Melbourne', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Memphis', 'altCityName': 'Memphis', 'fullName': 'Memphis Grizzlies', 'tricode': 'MEM', 'teamId': '1610612763', 'nickname': 'Grizzlies', 'urlName': 'grizzlies', 'teamShortName': 'Memphis', 'confName': 'West', 'divName': 'Southwest'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Miami', 'altCityName': 'Miami', 'fullName': 'Miami Heat', 'tricode': 'MIA', 'teamId': '1610612748', 'nickname': 'Heat', 'urlName': 'heat', 'teamShortName': 'Miami', 'confName': 'East', 'divName': 'Southeast'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Milwaukee', 'altCityName': 'Milwaukee', 'fullName': 'Milwaukee Bucks', 'tricode': 'MIL', 'teamId': '1610612749', 'nickname': 'Bucks', 'urlName': 'bucks', 'teamShortName': 'Milwaukee', 'confName': 'East', 'divName': 'Central'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Minnesota', 'altCityName': 'Minnesota', 'fullName': 'Minnesota Timberwolves', 'tricode': 'MIN', 'teamId': '1610612750', 'nickname': 'Timberwolves', 'urlName': 'timberwolves', 'teamShortName': 'Minnesota', 'confName': 'West', 'divName': 'Northwest'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'New Orleans', 'altCityName': 'New Orleans', 'fullName': 'New Orleans Pelicans', 'tricode': 'NOP', 'teamId': '1610612740', 'nickname': 'Pelicans', 'urlName': 'pelicans', 'teamShortName': 'New Orleans', 'confName': 'West', 'divName': 'Southwest'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'New York', 'altCityName': 'New York', 'fullName': 'New York Knicks', 'tricode': 'NYK', 'teamId': '1610612752', 'nickname': 'Knicks', 'urlName': 'knicks', 'teamShortName': 'New York', 'confName': 'East', 'divName': 'Atlantic'}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'New Zealand', 'altCityName': 'New Zealand', 'fullName': 'New Zealand Breakers', 'tricode': 'NZB', 'teamId': '15020', 'nickname': 'Breakers', 'urlName': 'breakers', 'teamShortName': 'New Zealand', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Oklahoma City', 'altCityName': 'Oklahoma City', 'fullName': 'Oklahoma City Thunder', 'tricode': 'OKC', 'teamId': '1610612760', 'nickname': 'Thunder', 'urlName': 'thunder', 'teamShortName': 'Oklahoma City', 'confName': 'West', 'divName': 'Northwest'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Orlando', 'altCityName': 'Orlando', 'fullName': 'Orlando Magic', 'tricode': 'ORL', 'teamId': '1610612753', 'nickname': 'Magic', 'urlName': 'magic', 'teamShortName': 'Orlando', 'confName': 'East', 'divName': 'Southeast'}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Perth', 'altCityName': 'Perth', 'fullName': 'Perth Wildcats', 'tricode': 'PER', 'teamId': '104', 'nickname': 'Wildcats', 'urlName': 'wildcats', 'teamShortName': 'Perth', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Philadelphia', 'altCityName': 'Philadelphia', 'fullName': 'Philadelphia 76ers', 'tricode': 'PHI', 'teamId': '1610612755', 'nickname': '76ers', 'urlName': 'sixers', 'teamShortName': 'Philadelphia', 'confName': 'East', 'divName': 'Atlantic'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Phoenix', 'altCityName': 'Phoenix', 'fullName': 'Phoenix Suns', 'tricode': 'PHX', 'teamId': '1610612756', 'nickname': 'Suns', 'urlName': 'suns', 'teamShortName': 'Phoenix', 'confName': 'West', 'divName': 'Pacific'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Portland', 'altCityName': 'Portland', 'fullName': 'Portland Trail Blazers', 'tricode': 'POR', 'teamId': '1610612757', 'nickname': 'Trail Blazers', 'urlName': 'blazers', 'teamShortName': 'Portland', 'confName': 'West', 'divName': 'Northwest'}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Rio de Janeiro', 'altCityName': 'Rio de Janeiro', 'fullName': 'Rio de Janeiro Flamengo', 'tricode': 'FLA', 'teamId': '12325', 'nickname': 'Flamengo', 'urlName': 'flamengo', 'teamShortName': 'Flamengo', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Sacramento', 'altCityName': 'Sacramento', 'fullName': 'Sacramento Kings', 'tricode': 'SAC', 'teamId': '1610612758', 'nickname': 'Kings', 'urlName': 'kings', 'teamShortName': 'Sacramento', 'confName': 'West', 'divName': 'Pacific'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'San Antonio', 'altCityName': 'San Antonio', 'fullName': 'San Antonio Spurs', 'tricode': 'SAS', 'teamId': '1610612759', 'nickname': 'Spurs', 'urlName': 'spurs', 'teamShortName': 'San Antonio', 'confName': 'West', 'divName': 'Southwest'}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Shanghai', 'altCityName': 'Shanghai', 'fullName': 'Shanghai Sharks', 'tricode': 'SDS', 'teamId': '12329', 'nickname': 'Sharks', 'urlName': 'shanghai_sharks', 'teamShortName': 'Shanghai', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Sydney', 'altCityName': 'Sydney', 'fullName': 'Sydney Kings', 'tricode': 'SYD', 'teamId': '15015', 'nickname': 'Kings', 'urlName': 'sydkings', 'teamShortName': 'Sydney', 'confName': 'Intl', 'divName': ''}, {'isNBAFranchise': False, 'isAllStar': True, 'city': 'Team', 'altCityName': 'Team', 'fullName': 'All-Stars', 'tricode': 'EST', 'teamId': '1699999999', 'nickname': 'All-Stars', 'urlName': 'assn_away', 'confName': 'East', 'divName': 'East'}, {'isNBAFranchise': False, 'isAllStar': True, 'city': 'Team', 'altCityName': 'Team', 'fullName': 'All-Stars', 'tricode': 'WST', 'teamId': '1699999998', 'nickname': 'All-Stars', 'urlName': 'assn_home', 'confName': 'West', 'divName': 'West'}, {'isNBAFranchise': False, 'isAllStar': True, 'city': 'Team Giannis', 'altCityName': 'Team Giannis', 'fullName': 'Team Giannis', 'tricode': 'GNS', 'teamId': '1610616833', 'nickname': 'Team Giannis', 'urlName': 'team_giannis', 'teamShortName': 'Team Giannis', 'confName': 'East', 'divName': 'East'}, {'isNBAFranchise': False, 'isAllStar': True, 'city': 'Team LeBron', 'altCityName': 'Team LeBron', 'fullName': 'Team LeBron', 'tricode': 'LBN', 'teamId': '1610616834', 'nickname': 'Team LeBron', 'urlName': 'team_lebron', 'teamShortName': 'Team LeBron', 'confName': 'West', 'divName': 'West'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Toronto', 'altCityName': 'Toronto', 'fullName': 'Toronto Raptors', 'tricode': 'TOR', 'teamId': '1610612761', 'nickname': 'Raptors', 'urlName': 'raptors', 'teamShortName': 'Toronto', 'confName': 'East', 'divName': 'Atlantic'}, {'isNBAFranchise': False, 'isAllStar': True, 'city': 'USA', 'altCityName': 'USA', 'fullName': 'USA', 'tricode': 'USA', 'teamId': '1610616843', 'nickname': 'USA', 'urlName': 'usa', 'teamShortName': 'USA', 'confName': 'East', 'divName': 'East'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Utah', 'altCityName': 'Utah', 'fullName': 'Utah Jazz', 'tricode': 'UTA', 'teamId': '1610612762', 'nickname': 'Jazz', 'urlName': 'jazz', 'teamShortName': 'Utah', 'confName': 'West', 'divName': 'Northwest'}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Washington', 'altCityName': 'Washington', 'fullName': 'Washington Wizards', 'tricode': 'WAS', 'teamId': '1610612764', 'nickname': 'Wizards', 'urlName': 'wizards', 'teamShortName': 'Washington', 'confName': 'East', 'divName': 'Southeast'}, {'isNBAFranchise': False, 'isAllStar': True, 'city': 'World', 'altCityName': 'World', 'fullName': 'World', 'tricode': 'WLD', 'teamId': '1610616844', 'nickname': 'World', 'urlName': 'world', 'teamShortName': 'World', 'confName': 'East', 'divName': 'East'}], 'africa': [{'isNBAFranchise': False, 'isAllStar': False, 'city': 'Team', 'altCityName': 'Team', 'fullName': 'Team USA', 'tricode': 'USA', 'teamId': '22', 'nickname': 'USA', 'urlName': 'nhs_usa', 'teamShortName': 'USA', 'confName': '', 'divName': ''}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Team', 'altCityName': 'Team', 'fullName': 'Team World', 'tricode': 'WLD', 'teamId': '21', 'nickname': 'World', 'urlName': 'nhs_world', 'teamShortName': 'World', 'confName': '', 'divName': ''}], 'sacramento': [{'isNBAFranchise': True, 'isAllStar': False, 'city': 'Golden State', 'altCityName': 'Golden State', 'fullName': 'Golden State Warriors', 'tricode': 'GSW', 'teamId': '1610612744', 'nickname': 'Warriors', 'urlName': 'warriors', 'teamShortName': 'Golden State', 'confName': 'Sacramento', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Los Angeles', 'altCityName': 'Los Angeles Lakers', 'fullName': 'Los Angeles Lakers', 'tricode': 'LAL', 'teamId': '1610612747', 'nickname': 'Lakers', 'urlName': 'lakers', 'teamShortName': 'L.A. Lakers', 'confName': 'Sacramento', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Miami', 'altCityName': 'Miami', 'fullName': 'Miami Heat', 'tricode': 'MIA', 'teamId': '1610612748', 'nickname': 'Heat', 'urlName': 'heat', 'teamShortName': 'Miami', 'confName': 'Sacramento', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Sacramento', 'altCityName': 'Sacramento', 'fullName': 'Sacramento Kings', 'tricode': 'SAC', 'teamId': '1610612758', 'nickname': 'Kings', 'urlName': 'kings', 'teamShortName': 'Sacramento', 'confName': 'Sacramento', 'divName': ''}], 'vegas': [{'isNBAFranchise': True, 'isAllStar': False, 'city': 'Atlanta', 'altCityName': 'Atlanta', 'fullName': 'Atlanta Hawks', 'tricode': 'ATL', 'teamId': '1610612737', 'nickname': 'Hawks', 'urlName': 'hawks', 'teamShortName': 'Atlanta', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Boston', 'altCityName': 'Boston', 'fullName': 'Boston Celtics', 'tricode': 'BOS', 'teamId': '1610612738', 'nickname': 'Celtics', 'urlName': 'celtics', 'teamShortName': 'Boston', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Brooklyn', 'altCityName': 'Brooklyn', 'fullName': 'Brooklyn Nets', 'tricode': 'BKN', 'teamId': '1610612751', 'nickname': 'Nets', 'urlName': 'nets', 'teamShortName': 'Brooklyn', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Charlotte', 'altCityName': 'Charlotte', 'fullName': 'Charlotte Hornets', 'tricode': 'CHA', 'teamId': '1610612766', 'nickname': 'Hornets', 'urlName': 'hornets', 'teamShortName': 'Charlotte', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Chicago', 'altCityName': 'Chicago', 'fullName': 'Chicago Bulls', 'tricode': 'CHI', 'teamId': '1610612741', 'nickname': 'Bulls', 'urlName': 'bulls', 'teamShortName': 'Chicago', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'China', 'altCityName': 'China', 'fullName': 'Team China', 'tricode': 'CHN', 'teamId': '45', 'nickname': 'China', 'urlName': 'china', 'teamShortName': 'China', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Cleveland', 'altCityName': 'Cleveland', 'fullName': 'Cleveland Cavaliers', 'tricode': 'CLE', 'teamId': '1610612739', 'nickname': 'Cavaliers', 'urlName': 'cavaliers', 'teamShortName': 'Cleveland', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': False, 'isAllStar': False, 'city': 'Croatia', 'altCityName': 'Croatia', 'fullName': 'Team Croatia', 'tricode': 'CRO', 'teamId': '70', 'nickname': 'Croatia', 'urlName': 'croatia', 'teamShortName': 'Croatia', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Dallas', 'altCityName': 'Dallas', 'fullName': 'Dallas Mavericks', 'tricode': 'DAL', 'teamId': '1610612742', 'nickname': 'Mavericks', 'urlName': 'mavericks', 'teamShortName': 'Dallas', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Denver', 'altCityName': 'Denver', 'fullName': 'Denver Nuggets', 'tricode': 'DEN', 'teamId': '1610612743', 'nickname': 'Nuggets', 'urlName': 'nuggets', 'teamShortName': 'Denver', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Detroit', 'altCityName': 'Detroit', 'fullName': 'Detroit Pistons', 'tricode': 'DET', 'teamId': '1610612765', 'nickname': 'Pistons', 'urlName': 'pistons', 'teamShortName': 'Detroit', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Golden State', 'altCityName': 'Golden State', 'fullName': 'Golden State Warriors', 'tricode': 'GSW', 'teamId': '1610612744', 'nickname': 'Warriors', 'urlName': 'warriors', 'teamShortName': 'Golden State', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Houston', 'altCityName': 'Houston', 'fullName': 'Houston Rockets', 'tricode': 'HOU', 'teamId': '1610612745', 'nickname': 'Rockets', 'urlName': 'rockets', 'teamShortName': 'Houston', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Indiana', 'altCityName': 'Indiana', 'fullName': 'Indiana Pacers', 'tricode': 'IND', 'teamId': '1610612754', 'nickname': 'Pacers', 'urlName': 'pacers', 'teamShortName': 'Indiana', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'LA', 'altCityName': 'LA Clippers', 'fullName': 'LA Clippers', 'tricode': 'LAC', 'teamId': '1610612746', 'nickname': 'Clippers', 'urlName': 'clippers', 'teamShortName': 'LA Clippers', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Los Angeles', 'altCityName': 'Los Angeles Lakers', 'fullName': 'Los Angeles Lakers', 'tricode': 'LAL', 'teamId': '1610612747', 'nickname': 'Lakers', 'urlName': 'lakers', 'teamShortName': 'L.A. Lakers', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Memphis', 'altCityName': 'Memphis', 'fullName': 'Memphis Grizzlies', 'tricode': 'MEM', 'teamId': '1610612763', 'nickname': 'Grizzlies', 'urlName': 'grizzlies', 'teamShortName': 'Memphis', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Miami', 'altCityName': 'Miami', 'fullName': 'Miami Heat', 'tricode': 'MIA', 'teamId': '1610612748', 'nickname': 'Heat', 'urlName': 'heat', 'teamShortName': 'Miami', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Milwaukee', 'altCityName': 'Milwaukee', 'fullName': 'Milwaukee Bucks', 'tricode': 'MIL', 'teamId': '1610612749', 'nickname': 'Bucks', 'urlName': 'bucks', 'teamShortName': 'Milwaukee', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Minnesota', 'altCityName': 'Minnesota', 'fullName': 'Minnesota Timberwolves', 'tricode': 'MIN', 'teamId': '1610612750', 'nickname': 'Timberwolves', 'urlName': 'timberwolves', 'teamShortName': 'Minnesota', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'New Orleans', 'altCityName': 'New Orleans', 'fullName': 'New Orleans Pelicans', 'tricode': 'NOP', 'teamId': '1610612740', 'nickname': 'Pelicans', 'urlName': 'pelicans', 'teamShortName': 'New Orleans', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'New York', 'altCityName': 'New York', 'fullName': 'New York Knicks', 'tricode': 'NYK', 'teamId': '1610612752', 'nickname': 'Knicks', 'urlName': 'knicks', 'teamShortName': 'New York', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Oklahoma City', 'altCityName': 'Oklahoma City', 'fullName': 'Oklahoma City Thunder', 'tricode': 'OKC', 'teamId': '1610612760', 'nickname': 'Thunder', 'urlName': 'thunder', 'teamShortName': 'Oklahoma City', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Orlando', 'altCityName': 'Orlando', 'fullName': 'Orlando Magic', 'tricode': 'ORL', 'teamId': '1610612753', 'nickname': 'Magic', 'urlName': 'magic', 'teamShortName': 'Orlando', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Philadelphia', 'altCityName': 'Philadelphia', 'fullName': 'Philadelphia 76ers', 'tricode': 'PHI', 'teamId': '1610612755', 'nickname': '76ers', 'urlName': 'sixers', 'teamShortName': 'Philadelphia', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Phoenix', 'altCityName': 'Phoenix', 'fullName': 'Phoenix Suns', 'tricode': 'PHX', 'teamId': '1610612756', 'nickname': 'Suns', 'urlName': 'suns', 'teamShortName': 'Phoenix', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Portland', 'altCityName': 'Portland', 'fullName': 'Portland Trail Blazers', 'tricode': 'POR', 'teamId': '1610612757', 'nickname': 'Trail Blazers', 'urlName': 'blazers', 'teamShortName': 'Portland', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Sacramento', 'altCityName': 'Sacramento', 'fullName': 'Sacramento Kings', 'tricode': 'SAC', 'teamId': '1610612758', 'nickname': 'Kings', 'urlName': 'kings', 'teamShortName': 'Sacramento', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'San Antonio', 'altCityName': 'San Antonio', 'fullName': 'San Antonio Spurs', 'tricode': 'SAS', 'teamId': '1610612759', 'nickname': 'Spurs', 'urlName': 'spurs', 'teamShortName': 'San Antonio', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Toronto', 'altCityName': 'Toronto', 'fullName': 'Toronto Raptors', 'tricode': 'TOR', 'teamId': '1610612761', 'nickname': 'Raptors', 'urlName': 'raptors', 'teamShortName': 'Toronto', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Utah', 'altCityName': 'Utah', 'fullName': 'Utah Jazz', 'tricode': 'UTA', 'teamId': '1610612762', 'nickname': 'Jazz', 'urlName': 'jazz', 'teamShortName': 'Utah', 'confName': 'summer', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Washington', 'altCityName': 'Washington', 'fullName': 'Washington Wizards', 'tricode': 'WAS', 'teamId': '1610612764', 'nickname': 'Wizards', 'urlName': 'wizards', 'teamShortName': 'Washington', 'confName': 'summer', 'divName': ''}], 'utah': [{'isNBAFranchise': True, 'isAllStar': False, 'city': 'Cleveland', 'altCityName': 'Cleveland', 'fullName': 'Cleveland Cavaliers', 'tricode': 'CLE', 'teamId': '1610612739', 'nickname': 'Cavaliers', 'urlName': 'cavaliers', 'teamShortName': 'Cleveland', 'confName': 'Utah', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Memphis', 'altCityName': 'Memphis', 'fullName': 'Memphis Grizzlies', 'tricode': 'MEM', 'teamId': '1610612763', 'nickname': 'Grizzlies', 'urlName': 'grizzlies', 'teamShortName': 'Memphis', 'confName': 'Utah', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'San Antonio', 'altCityName': 'San Antonio', 'fullName': 'San Antonio Spurs', 'tricode': 'SAS', 'teamId': '1610612759', 'nickname': 'Spurs', 'urlName': 'spurs', 'teamShortName': 'San Antonio', 'confName': 'Utah', 'divName': ''}, {'isNBAFranchise': True, 'isAllStar': False, 'city': 'Utah', 'altCityName': 'Utah', 'fullName': 'Utah Jazz', 'tricode': 'UTA', 'teamId': '1610612762', 'nickname': 'Jazz', 'urlName': 'jazz', 'teamShortName': 'Utah', 'confName': 'Utah', 'divName': ''}]}}
import requests
request_url = "http://data.nba.net/prod/v2/2019/teams.json"
response = requests.get(request_url)
response_json = response.json()
teams = response_json["league"]["standard"]
# Continue from here...
print("2019-2020 球季 NBA 有 {} 支球隊".format(n_nba_teams))
2019-2020 球季 NBA 有 30 支球隊
team_dict = {}
for t in teams:
div = t["divName"]
full_name = t["fullName"]
if div in team_dict:
team_dict[div].append(full_name)
else:
team_dict[div] = [full_name]
# Continue from here...
print("屬於 Atlantic 與 Southwest 的球隊有 {} 個:".format(n_as_teams))
print("Atlantic: {}".format(team_dict["Atlantic"]))
print("Southwest: {}".format(team_dict["Southwest"]))
屬於 Atlantic 與 Southwest 的球隊有 10 個: Atlantic: ['Boston Celtics', 'Brooklyn Nets', 'New York Knicks', 'Philadelphia 76ers', 'Toronto Raptors'] Southwest: ['Dallas Mavericks', 'Houston Rockets', 'Memphis Grizzlies', 'New Orleans Pelicans', 'San Antonio Spurs']
可延伸標示語(Extensible Markup Language)是一個讓文件同時能夠很容易地讓人去閱讀,又很容易讓電腦程式去辨識的語言格式,和 JSON 格式相同常被用於網站上的資料呈現、傳輸。
Source: https://www.w3schools.com/xml/
requests
請求資料.content
屬性,例如 response.content
lxml
搭配 XPath 解析XML Path Language,譯作 XML 路徑語言,用來定位 XML 檔案中特定資訊的位置。
import requests
#進行 POST 請求時要攜帶資料
form_data = {
"commandid": "GetTown",
"cityid": "01"
}
request_url = "https://emap.pcsc.com.tw/EMapSDK.aspx"
response = requests.post(request_url, data=form_data)
print(response.status_code)
200
.content
屬性¶response_content = response.content
print(response_content)
b'<?xml version="1.0" encoding="utf-8"?><iMapSDKOutput><MessageID>00000</MessageID><CommandID>GetTown</CommandID><Status>\xe9\x80\xa3\xe7\xb7\x9a\xe6\x88\x90\xe5\x8a\x9f</Status><TimeStamp>2020/1/15 \xe4\xb8\x8b\xe5\x8d\x88 02:56:33</TimeStamp><GeoPosition><TownID>01</TownID><TownName>\xe6\x9d\xbe\xe5\xb1\xb1\xe5\x8d\x80</TownName><X>121577218</X><Y>25049837</Y></GeoPosition><GeoPosition><TownID>02</TownID><TownName>\xe4\xbf\xa1\xe7\xbe\xa9\xe5\x8d\x80</TownName><X>121567161</X><Y>25033147</Y></GeoPosition><GeoPosition><TownID>03</TownID><TownName>\xe5\xa4\xa7\xe5\xae\x89\xe5\x8d\x80</TownName><X>121534593</X><Y>25026482</Y></GeoPosition><GeoPosition><TownID>04</TownID><TownName>\xe4\xb8\xad\xe5\xb1\xb1\xe5\x8d\x80</TownName><X>121533655</X><Y>25064427</Y></GeoPosition><GeoPosition><TownID>05</TownID><TownName>\xe4\xb8\xad\xe6\xad\xa3\xe5\x8d\x80</TownName><X>121518245</X><Y>25032251</Y></GeoPosition><GeoPosition><TownID>06</TownID><TownName>\xe5\xa4\xa7\xe5\x90\x8c\xe5\x8d\x80</TownName><X>121515830</X><Y>25066142</Y></GeoPosition><GeoPosition><TownID>07</TownID><TownName>\xe8\x90\xac\xe8\x8f\xaf\xe5\x8d\x80</TownName><X>121499745</X><Y>25034807</Y></GeoPosition><GeoPosition><TownID>08</TownID><TownName>\xe6\x96\x87\xe5\xb1\xb1\xe5\x8d\x80</TownName><X>121570280</X><Y>24989800</Y></GeoPosition><GeoPosition><TownID>09</TownID><TownName>\xe5\x8d\x97\xe6\xb8\xaf\xe5\x8d\x80</TownName><X>121607043</X><Y>25054684</Y></GeoPosition><GeoPosition><TownID>10</TownID><TownName>\xe5\x85\xa7\xe6\xb9\x96\xe5\x8d\x80</TownName><X>121589471</X><Y>25069353</Y></GeoPosition><GeoPosition><TownID>11</TownID><TownName>\xe5\xa3\xab\xe6\x9e\x97\xe5\x8d\x80</TownName><X>121525380</X><Y>25090430</Y></GeoPosition><GeoPosition><TownID>12</TownID><TownName>\xe5\x8c\x97\xe6\x8a\x95\xe5\x8d\x80</TownName><X>121503066</X><Y>25132054</Y></GeoPosition></iMapSDKOutput>'
/iMapSDKOutput/GeoPosition/TownName
或//TownName
/iMapSDKOutput/RoadName/rd_name_1
或//rd_name_1
/iMapSDKOutput/RoadName/section_1
或//section_1
/iMapSDKOutput/GeoPosition/POIName
或//POIName
lxml
解析行政區資訊¶from lxml import etree
from io import BytesIO
file = BytesIO(response_content)
tree = etree.parse(file)
town_names = [t.text for t in tree.xpath("//TownName")] # XPath 亦可以指定 /iMapSDKOutput/GeoPosition/TownName
print(town_names)
['松山區', '信義區', '大安區', '中山區', '中正區', '大同區', '萬華區', '文山區', '南港區', '內湖區', '士林區', '北投區']
import time
import random
tp_711_stores = {}
for town in town_names:
form_data = {
"commandid": "SearchStore",
"city": "台北市",
"town": town
}
r = requests.post("https://emap.pcsc.com.tw/EMapSDK.aspx", data=form_data)
f = BytesIO(r.content)
tree = etree.parse(f)
poi_ids = [t.text.strip() for t in tree.xpath("//POIID")]
poi_names = [t.text for t in tree.xpath("//POIName")]
lons = [float(t.text)/1000000 for t in tree.xpath("//X")]
lats = [float(t.text)/1000000 for t in tree.xpath("//Y")]
adds = [t.text for t in tree.xpath("//Address")]
tp_711_stores[town] = []
for poi_id, poi_name, lon, lat, add in zip(poi_ids, poi_names, lons, lats, adds):
store_info = {
"POIID": poi_id,
"POIName": poi_name,
"Longitude": lon,
"Latitude": lat,
"Address": add
}
tp_711_stores[town].append(store_info)
time.sleep(random.randint(1, 6))
print("Scraping {}".format(town))
Scraping 松山區 Scraping 信義區 Scraping 大安區 Scraping 中山區 Scraping 中正區 Scraping 大同區 Scraping 萬華區 Scraping 文山區 Scraping 南港區 Scraping 內湖區 Scraping 士林區 Scraping 北投區
print(tp_711_stores["松山區"][0])
print(tp_711_stores["信義區"][0])
print(tp_711_stores["大安區"][0])
{'POIID': '170945', 'POIName': '上弘', 'Longitude': 121.548287390895, 'Latitude': 25.056390968531797, 'Address': '台北市松山區敦化北路168號B2'} {'POIID': '167651', 'POIName': '一零一', 'Longitude': 121.565077, 'Latitude': 25.033373, 'Address': '台北市信義區信義路五段7號35樓'} {'POIID': '153319', 'POIName': '大台', 'Longitude': 121.53261437826, 'Latitude': 25.0179598345753, 'Address': '台北市大安區羅斯福路三段283巷14弄16號1樓'}
requests
請求資料.text
屬性,例如 response.text
bs4
搭配 Tag Name/CSS Selector 解析BeautifulSoup()
:創建 BeautifulSoup
類別
# !pip install -U BeautifulSoup4
from bs4 import BeautifulSoup
request_url = "https://www.imdb.com/title/tt4154796"
response = requests.get(request_url)
response_text = response.text
soup = BeautifulSoup(response_text)
print(type(soup))
<class 'bs4.BeautifulSoup'>
soup.find()
:尋找第一個符合標記名稱的資料soup.find_all()
:尋找所有符合標記名稱的資料soup.select()
:尋找所有符合 CSS 選擇的資料print(soup.find("h1"))
print(type(soup.find("h1")))
print(soup.find("h1").text)
print(soup.select("strong span"))
print(float(soup.select("strong span")[0].text))
<h1 class="">復仇者聯盟:終局之戰 <span id="titleYear">(<a href="/year/2019/">2019</a>)</span> </h1> <class 'bs4.element.Tag'> 復仇者聯盟:終局之戰 (2019) [<span itemprop="ratingValue">8.5</span>] 8.5
element.Tag.text
:取出標記中的文字值element.Tag.get(attr)
:取出標記中的指定屬性print(len(soup.find_all("img")))
print(soup.find_all("img")[2])
print(soup.find_all("img")[2].get("alt"))
print(soup.find_all("img")[2].get("src"))
78 <img class="pro_logo" src="https://m.media-amazon.com/images/G/01/imdb/IMDbConsumerSiteProTitleViews/images/logo/pro_logo_dark-3176609149._CB455053166_.png"/> None https://m.media-amazon.com/images/G/01/imdb/IMDbConsumerSiteProTitleViews/images/logo/pro_logo_dark-3176609149._CB455053166_.png
print(soup.select("strong span"))
print(float(soup.select("strong span")[0].text))
[<span itemprop="ratingValue">8.5</span>] 8.5
requests
搭配 bs4
擷取 Avengers: Endgame (2019) 的劇情類型¶response = requests.get("https://www.imdb.com/title/tt4154796")
soup = BeautifulSoup(response.text)
# Continue from here...
print(genre)
['Action', 'Adventure', 'Drama']
requests
搭配 bs4
擷取 Avengers: Endgame (2019) 的演員陣容¶response = requests.get("https://www.imdb.com/title/tt4154796")
soup = BeautifulSoup(response.text)
# Continue from here...
print(cast)
['Robert Downey Jr.', 'Chris Evans', 'Mark Ruffalo', 'Chris Hemsworth', 'Scarlett Johansson', 'Jeremy Renner', 'Don Cheadle', 'Paul Rudd', 'Benedict Cumberbatch', 'Chadwick Boseman', 'Brie Larson', 'Tom Holland', 'Karen Gillan', 'Zoe Saldana', 'Evangeline Lilly']
get_movie_data(movie_url)
¶get_movie_data("https://www.imdb.com/title/tt4154796")
{'movieTitle': '復仇者聯盟:終局之戰(2019)', 'moviePoster': 'https://m.media-amazon.com/images/M/MV5BMTc5MDE2ODcwNV5BMl5BanBnXkFtZTgwMzI2NzQ2NzM@._V1_UX182_CR0,0,182,268_AL_.jpg', 'movieRating': 8.5, 'movieGenre': ['Action', 'Adventure', 'Drama'], 'movieCast': ['Robert Downey Jr.', 'Chris Evans', 'Mark Ruffalo', 'Chris Hemsworth', 'Scarlett Johansson', 'Jeremy Renner', 'Don Cheadle', 'Paul Rudd', 'Benedict Cumberbatch', 'Chadwick Boseman', 'Brie Larson', 'Tom Holland', 'Karen Gillan', 'Zoe Saldana', 'Evangeline Lilly']}
get_movie_data()
更方便使用¶get()
中加入 params
¶query_string_parameters = {
'q': 'Avengers: Endgame',
'ref_': 'nv_sr_sm'
}
query_string_parameters = {
'q': 'Avengers: Endgame',
'ref_': 'nv_sr_sm'
}
request_url = "https://www.imdb.com/find"
response = requests.get(request_url, params=query_string_parameters)
print(response.status_code)
200
.result_text > a
CSS 選擇器把所有的搜尋結果擷取下來¶soup = BeautifulSoup(response.text)
result_hrefs = [e.get("href") for e in soup.select(".result_text > a")]
print(result_hrefs)
['/title/tt4154796/', '/title/tt10258872/', '/title/tt9827182/', '/title/tt10025738/', '/title/tt10042140/', '/title/tt10022970/', '/title/tt10778688/', '/title/tt10213650/', '/search/keyword?keywords=reference-to-avengers-endgame', '/search/keyword?keywords=reference-to-%27avengers-endgame%27-2019']
movie_url = "https://www.imdb.com" + result_hrefs[0]
print(movie_url)
https://www.imdb.com/title/tt4154796/
get_movie_data(movie_title)
¶get_movie_data("Avengers: Endgame (2019)")
{'movieTitle': '復仇者聯盟:終局之戰 (2019)', 'moviePoster': 'https://m.media-amazon.com/images/M/MV5BMTc5MDE2ODcwNV5BMl5BanBnXkFtZTgwMzI2NzQ2NzM@._V1_UX182_CR0,0,182,268_AL_.jpg', 'movieRating': 8.5, 'movieGenre': ['Action', 'Adventure', 'Drama'], 'movieCast': ['Robert Downey Jr.', 'Chris Evans', 'Mark Ruffalo', 'Chris Hemsworth', 'Scarlett Johansson', 'Jeremy Renner', 'Don Cheadle', 'Paul Rudd', 'Benedict Cumberbatch', 'Chadwick Boseman', 'Brie Larson', 'Tom Holland', 'Karen Gillan', 'Zoe Saldana', 'Evangeline Lilly']}
response = requests.get("https://www.ptt.cc/bbs/Gossiping/index.html")
print(response.text)
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>批踢踢實業坊</title> <link rel="stylesheet" type="text/css" href="//images.ptt.cc/bbs/v2.27/bbs-common.css"> <link rel="stylesheet" type="text/css" href="//images.ptt.cc/bbs/v2.27/bbs-base.css" media="screen"> <link rel="stylesheet" type="text/css" href="//images.ptt.cc/bbs/v2.27/bbs-custom.css"> <link rel="stylesheet" type="text/css" href="//images.ptt.cc/bbs/v2.27/pushstream.css" media="screen"> <link rel="stylesheet" type="text/css" href="//images.ptt.cc/bbs/v2.27/bbs-print.css" media="print"> </head> <body> <div class="bbs-screen bbs-content"> <div class="over18-notice"> <p>本網站已依網站內容分級規定處理</p> <p>警告︰您即將進入之看板內容需滿十八歲方可瀏覽。</p> <p>若您尚未年滿十八歲,請點選離開。若您已滿十八歲,亦不可將本區之內容派發、傳閱、出售、出租、交給或借予年齡未滿18歲的人士瀏覽,或將本網站內容向該人士出示、播放或放映。</p> </div> </div> <div class="bbs-screen bbs-content center clear"> <form action="/ask/over18" method="post"> <input type="hidden" name="from" value="/bbs/Gossiping/index.html"> <div class="over18-button-container"> <button class="btn-big" type="submit" name="yes" value="yes">我同意,我已年滿十八歲<br><small>進入</small></button> </div> <div class="over18-button-container"> <button class="btn-big" type="submit" name="no" value="no">未滿十八歲或不同意本條款<br><small>離開</small></button> </div> </form> </div> <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-32365737-1', { cookieDomain: 'ptt.cc', legacyCookieDomain: 'ptt.cc' }); ga('send', 'pageview'); </script> <script src="//ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <script src="//images.ptt.cc/bbs/v2.27/bbs.js"></script> </body> </html>
response = requests.get("http://www.fantasy-sky.com/ContentList.aspx?section=002")
soup = BeautifulSoup(response.text)
movie_titles = [i.text for i in soup.select(".movies-name")]
print(movie_titles)
['唐頓莊園', '星際救援', '雙子殺手', '牠:第二章', '黑魔女2', '電流大戰', '屍樂園:髒比雙拼', '金翅雀', '玩命關頭:特別行動', '全面攻佔3:天使救援', '舞孃騙很大', '盧斯', '瞞天機密', '弒婚遊戲', '獅子王', '五月天人生無限公司', '花椒之味', '下半場', '情牽拉麵茶', '光', '追龍II:賊王', '流浪地球', '一定要結婚嗎', '亡命之途', '柴公園', '東京喰種', '匿名的畫作', '殺手寓言', '新聞記者', '小委託人', '驅魔使者', '辛巴', '門當護不對', '電影哆啦A夢:大雄的月球探測記', '極限逃生', '陪審團', '跳痛先生', '出發吧!我的脫單假期', '偵兇']
import requests
response = requests.get("https://www.ptt.cc/bbs/Gossiping/index.html", cookies={'over18': '1'})
print(response.text)
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>看板 Gossiping 文章列表 - 批踢踢實業坊</title> <link rel="stylesheet" type="text/css" href="//images.ptt.cc/bbs/v2.27/bbs-common.css"> <link rel="stylesheet" type="text/css" href="//images.ptt.cc/bbs/v2.27/bbs-base.css" media="screen"> <link rel="stylesheet" type="text/css" href="//images.ptt.cc/bbs/v2.27/bbs-custom.css"> <link rel="stylesheet" type="text/css" href="//images.ptt.cc/bbs/v2.27/pushstream.css" media="screen"> <link rel="stylesheet" type="text/css" href="//images.ptt.cc/bbs/v2.27/bbs-print.css" media="print"> </head> <body> <div id="topbar-container"> <div id="topbar" class="bbs-content"> <a id="logo" href="/bbs/">批踢踢實業坊</a> <span>›</span> <a class="board" href="/bbs/Gossiping/index.html"><span class="board-label">看板 </span>Gossiping</a> <a class="right small" href="/about.html">關於我們</a> <a class="right small" href="/contact.html">聯絡資訊</a> </div> </div> <div id="main-container"> <div id="action-bar-container"> <div class="action-bar"> <div class="btn-group btn-group-dir"> <a class="btn selected" href="/bbs/Gossiping/index.html">看板</a> <a class="btn" href="/man/Gossiping/index.html">精華區</a> </div> <div class="btn-group btn-group-paging"> <a class="btn wide" href="/bbs/Gossiping/index1.html">最舊</a> <a class="btn wide" href="/bbs/Gossiping/index39176.html">‹ 上頁</a> <a class="btn wide disabled">下頁 ›</a> <a class="btn wide" href="/bbs/Gossiping/index.html">最新</a> </div> </div> </div> <div class="r-list-container action-bar-margin bbs-screen"> <div class="search-bar"> <form type="get" action="search" id="search-bar"> <input class="query" type="text" name="q" value="" placeholder="搜尋文章⋯"> </form> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">1</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055284.A.87A.html">[問卦] 在微信上收到長輩的文要怎回</a> </div> <div class="meta"> <div class="author">leolivein</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E5%9C%A8%E5%BE%AE%E4%BF%A1%E4%B8%8A%E6%94%B6%E5%88%B0%E9%95%B7%E8%BC%A9%E7%9A%84%E6%96%87%E8%A6%81%E6%80%8E%E5%9B%9E">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Aleolivein">搜尋看板內 leolivein 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">5</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055313.A.1D1.html">[問卦] 要如何擁有一堆無腦粉絲的八卦</a> </div> <div class="meta"> <div class="author">meblessme</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E8%A6%81%E5%A6%82%E4%BD%95%E6%93%81%E6%9C%89%E4%B8%80%E5%A0%86%E7%84%A1%E8%85%A6%E7%B2%89%E7%B5%B2%E7%9A%84%E5%85%AB%E5%8D%A6">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Ameblessme">搜尋看板內 meblessme 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">4</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055315.A.976.html">[問卦] 有沒有南澳鄉的八卦</a> </div> <div class="meta"> <div class="author">azt911231</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E6%9C%89%E6%B2%92%E6%9C%89%E5%8D%97%E6%BE%B3%E9%84%89%E7%9A%84%E5%85%AB%E5%8D%A6">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Aazt911231">搜尋看板內 azt911231 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">7</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055318.A.F06.html">[問卦] 出生在哪個國家最爽</a> </div> <div class="meta"> <div class="author">paulabxz123</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E5%87%BA%E7%94%9F%E5%9C%A8%E5%93%AA%E5%80%8B%E5%9C%8B%E5%AE%B6%E6%9C%80%E7%88%BD">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Apaulabxz123">搜尋看板內 paulabxz123 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">1</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055333.A.0BA.html">[問卦] 捷克獵人的八卦?</a> </div> <div class="meta"> <div class="author">Clarence</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E6%8D%B7%E5%85%8B%E7%8D%B5%E4%BA%BA%E7%9A%84%E5%85%AB%E5%8D%A6%EF%BC%9F">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3AClarence">搜尋看板內 Clarence 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f3">11</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055357.A.A27.html">[新聞] 上海譴責布拉格友台柯文哲:雙城論壇續辦</a> </div> <div class="meta"> <div class="author">safefree</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E6%96%B0%E8%81%9E%5D+%E4%B8%8A%E6%B5%B7%E8%AD%B4%E8%B2%AC%E5%B8%83%E6%8B%89%E6%A0%BC%E5%8F%8B%E5%8F%B0%E6%9F%AF%E6%96%87%E5%93%B2%EF%BC%9A%E9%9B%99%E5%9F%8E%E8%AB%96%E5%A3%87%E7%BA%8C%E8%BE%A6">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Asafefree">搜尋看板內 safefree 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055362.A.530.html">[新聞] 掃蕩伊斯蘭好戰分子 德國警方分兵多路搜</a> </div> <div class="meta"> <div class="author">dragonjj</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E6%96%B0%E8%81%9E%5D+%E6%8E%83%E8%95%A9%E4%BC%8A%E6%96%AF%E8%98%AD%E5%A5%BD%E6%88%B0%E5%88%86%E5%AD%90+%E5%BE%B7%E5%9C%8B%E8%AD%A6%E6%96%B9%E5%88%86%E5%85%B5%E5%A4%9A%E8%B7%AF%E6%90%9C">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Adragonjj">搜尋看板內 dragonjj 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055384.A.77C.html">[問卦] 小隻馬同事露出肩帶</a> </div> <div class="meta"> <div class="author">ComeThrough</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E5%B0%8F%E9%9A%BB%E9%A6%AC%E5%90%8C%E4%BA%8B%E9%9C%B2%E5%87%BA%E8%82%A9%E5%B8%B6">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3AComeThrough">搜尋看板內 ComeThrough 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f3">75</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055386.A.3AE.html">[爆卦] 黃淵夏:反滲透法第一天白狼被約談</a> </div> <div class="meta"> <div class="author">GO19870325</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E7%88%86%E5%8D%A6%5D+%E9%BB%83%E6%B7%B5%E5%A4%8F%3A%E5%8F%8D%E6%BB%B2%E9%80%8F%E6%B3%95%E7%AC%AC%E4%B8%80%E5%A4%A9%E7%99%BD%E7%8B%BC%E8%A2%AB%E7%B4%84%E8%AB%87">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3AGO19870325">搜尋看板內 GO19870325 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055387.A.9E3.html">Re: [問卦] 中國為何這麼容易出現零號病人阿</a> </div> <div class="meta"> <div class="author">cdcardabc</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E4%B8%AD%E5%9C%8B%E7%82%BA%E4%BD%95%E9%80%99%E9%BA%BC%E5%AE%B9%E6%98%93%E5%87%BA%E7%8F%BE%E9%9B%B6%E8%99%9F%E7%97%85%E4%BA%BA%E9%98%BF">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Acdcardabc">搜尋看板內 cdcardabc 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f3">41</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055469.A.CA2.html">[新聞] 蔡總統10:40發表談話 將公布施行「反滲</a> </div> <div class="meta"> <div class="author">Gaffky</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E6%96%B0%E8%81%9E%5D+%E8%94%A1%E7%B8%BD%E7%B5%B110%EF%BC%9A40%E7%99%BC%E8%A1%A8%E8%AB%87%E8%A9%B1+%E5%B0%87%E5%85%AC%E5%B8%83%E6%96%BD%E8%A1%8C%E3%80%8C%E5%8F%8D%E6%BB%B2">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3AGaffky">搜尋看板內 Gaffky 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">4</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055478.A.3E9.html">[問卦] 故宮有什麼必看的啊?</a> </div> <div class="meta"> <div class="author">joe911joeop</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E6%95%85%E5%AE%AE%E6%9C%89%E4%BB%80%E9%BA%BC%E5%BF%85%E7%9C%8B%E7%9A%84%E5%95%8A%EF%BC%9F">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Ajoe911joeop">搜尋看板內 joe911joeop 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">2</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055496.A.D81.html">[問卦] 衛生紙為什麼要兩張?</a> </div> <div class="meta"> <div class="author">LAKobeBryant</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E8%A1%9B%E7%94%9F%E7%B4%99%E7%82%BA%E4%BB%80%E9%BA%BC%E8%A6%81%E5%85%A9%E5%BC%B5%EF%BC%9F">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3ALAKobeBryant">搜尋看板內 LAKobeBryant 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055537.A.4B6.html">[問卦] 牛寺哥是不是很可憐那</a> </div> <div class="meta"> <div class="author">taker627</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E7%89%9B%E5%AF%BA%E5%93%A5%E6%98%AF%E4%B8%8D%E6%98%AF%E5%BE%88%E5%8F%AF%E6%86%90%E9%82%A3">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Ataker627">搜尋看板內 taker627 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">2</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055562.A.D0D.html">Re: [問卦] 柯粉變多了嗎?</a> </div> <div class="meta"> <div class="author">opfish</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E6%9F%AF%E7%B2%89%E8%AE%8A%E5%A4%9A%E4%BA%86%E5%97%8E%EF%BC%9F">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Aopfish">搜尋看板內 opfish 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">2</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055634.A.C0A.html">Re: [問卦] 沒服過兵役是不是就沒資格喊台獨?</a> </div> <div class="meta"> <div class="author">klm</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E6%B2%92%E6%9C%8D%E9%81%8E%E5%85%B5%E5%BD%B9%E6%98%AF%E4%B8%8D%E6%98%AF%E5%B0%B1%E6%B2%92%E8%B3%87%E6%A0%BC%E5%96%8A%E5%8F%B0%E7%8D%A8%3F">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Aklm">搜尋看板內 klm 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">3</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055655.A.2DA.html">[問卦] 可憐吶~慈濟竟然放棄line改用telegram</a> </div> <div class="meta"> <div class="author">TellthEtRee</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E5%8F%AF%E6%86%90%E5%90%B6%EF%BD%9E%E6%85%88%E6%BF%9F%E7%AB%9F%E7%84%B6%E6%94%BE%E6%A3%84line%E6%94%B9%E7%94%A8telegram">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3ATellthEtRee">搜尋看板內 TellthEtRee 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">5</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055679.A.281.html">[問卦] 請問高雄市長今天有上班嗎</a> </div> <div class="meta"> <div class="author">cococat1028</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D+%E8%AB%8B%E5%95%8F%E9%AB%98%E9%9B%84%E5%B8%82%E9%95%B7%E4%BB%8A%E5%A4%A9%E6%9C%89%E4%B8%8A%E7%8F%AD%E5%97%8E">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Acococat1028">搜尋看板內 cococat1028 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">2</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055687.A.B60.html">Re: [討論] 劉仕傑臉書:對不起,我看不下去。 </a> </div> <div class="meta"> <div class="author">kuluma</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E8%A8%8E%E8%AB%96%5D+%E5%8A%89%E4%BB%95%E5%82%91%E8%87%89%E6%9B%B8%3A%E5%B0%8D%E4%B8%8D%E8%B5%B7%EF%BC%8C%E6%88%91%E7%9C%8B%E4%B8%8D%E4%B8%8B%E5%8E%BB%E3%80%82+">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Akuluma">搜尋看板內 kuluma 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">2</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1579055798.A.62A.html">Re: [新聞] 上海解除布拉格姊妹市 柯文哲:中國無權</a> </div> <div class="meta"> <div class="author">jiouje</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E6%96%B0%E8%81%9E%5D+%E4%B8%8A%E6%B5%B7%E8%A7%A3%E9%99%A4%E5%B8%83%E6%8B%89%E6%A0%BC%E5%A7%8A%E5%A6%B9%E5%B8%82+%E6%9F%AF%E6%96%87%E5%93%B2%EF%BC%9A%E4%B8%AD%E5%9C%8B%E7%84%A1%E6%AC%8A">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Ajiouje">搜尋看板內 jiouje 的文章</a></div> </div> </div> <div class="date"> 1/15</div> <div class="mark"></div> </div> </div> <div class="r-list-sep"></div> <div class="r-ent"> <div class="nrec"></div> <div class="title"> <a href="/bbs/Gossiping/M.1566347622.A.9C7.html">[公告] 八卦板板規(2019.08.21)</a> </div> <div class="meta"> <div class="author">arsonlolita</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%85%AC%E5%91%8A%5D+%E5%85%AB%E5%8D%A6%E6%9D%BF%E6%9D%BF%E8%A6%8F%282019.08.21%29">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Aarsonlolita">搜尋看板內 arsonlolita 的文章</a></div> </div> </div> <div class="date"> 8/21</div> <div class="mark">!</div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f3">57</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1578271293.A.A78.html">[協尋] 車禍過世 1/2 甲提南路立新一街 </a> </div> <div class="meta"> <div class="author">arsonlolita</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%8D%94%E5%B0%8B%5D+%E8%BB%8A%E7%A6%8D%E9%81%8E%E4%B8%96+1%2F2+%E7%94%B2%E6%8F%90%E5%8D%97%E8%B7%AF%E7%AB%8B%E6%96%B0%E4%B8%80%E8%A1%97+">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Aarsonlolita">搜尋看板內 arsonlolita 的文章</a></div> </div> </div> <div class="date"> 1/06</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"></div> <div class="title"> <a href="/bbs/Gossiping/M.1577812250.A.592.html">[公告] 赤鴻飛羽,一月份置底閒聊文</a> </div> <div class="meta"> <div class="author">Bignana</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%85%AC%E5%91%8A%5D+%E8%B5%A4%E9%B4%BB%E9%A3%9B%E7%BE%BD%EF%BC%8C%E4%B8%80%E6%9C%88%E4%BB%BD%E7%BD%AE%E5%BA%95%E9%96%92%E8%81%8A%E6%96%87">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3ABignana">搜尋看板內 Bignana 的文章</a></div> </div> </div> <div class="date"> 1/01</div> <div class="mark">M</div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f2">8</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1578694625.A.277.html">[協尋] 1/8晚間北市光復橋車禍</a> </div> <div class="meta"> <div class="author">DirKuan</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%8D%94%E5%B0%8B%5D+1%2F8%E6%99%9A%E9%96%93%E5%8C%97%E5%B8%82%E5%85%89%E5%BE%A9%E6%A9%8B%E8%BB%8A%E7%A6%8D">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3ADirKuan">搜尋看板內 DirKuan 的文章</a></div> </div> </div> <div class="date"> 1/11</div> <div class="mark"></div> </div> </div> <div class="r-ent"> <div class="nrec"><span class="hl f3">10</span></div> <div class="title"> <a href="/bbs/Gossiping/M.1578961532.A.51E.html">[協尋] 高雄左營區 行車記錄器 </a> </div> <div class="meta"> <div class="author">arsonlolita</div> <div class="article-menu"> <div class="trigger">⋯</div> <div class="dropdown"> <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%8D%94%E5%B0%8B%5D+%E9%AB%98%E9%9B%84%E5%B7%A6%E7%87%9F%E5%8D%80+%E8%A1%8C%E8%BB%8A%E8%A8%98%E9%8C%84%E5%99%A8+">搜尋同標題文章</a></div> <div class="item"><a href="/bbs/Gossiping/search?q=author%3Aarsonlolita">搜尋看板內 arsonlolita 的文章</a></div> </div> </div> <div class="date"> 1/14</div> <div class="mark"></div> </div> </div> </div> <div class="bbs-screen bbs-footer-message">本網站已依台灣網站內容分級規定處理。此區域為限制級,未滿十八歲者不得瀏覽。</div> </div> <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-32365737-1', { cookieDomain: 'ptt.cc', legacyCookieDomain: 'ptt.cc' }); ga('send', 'pageview'); </script> <script src="//ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <script src="//images.ptt.cc/bbs/v2.27/bbs.js"></script> </body> </html>
import requests
from bs4 import BeautifulSoup
response = requests.get("http://www.fantasy-sky.com/ContentList.aspx?section=002", cookies={'COOKIE_LANGUAGE': 'en'})
soup = BeautifulSoup(response.text)
movie_titles = [i.text for i in soup.select(".movies-name")]
print(movie_titles)
['Downton Abbey', 'Ad Astra', 'Gemini Man', 'It Chapter Two', 'Maleficent: Mistress of Evil', 'The Current War', 'Zombieland: Double Tap', 'The Goldfinch', 'Fast & Furious Presents: Hobbs…', 'Angel Has Fallen', 'Hustlers', 'Luce', 'Official Secrets', 'Ready or Not', 'Disney’s The Lion King', 'Mayday Life', 'Fagara', 'We Are Champions', 'Ramen Shop', 'Guang', 'Chasing The Dragon II…', 'The Wandering Earth', 'Marriage Hunting Beauty', 'Paradise Next', 'Shiba-Park', "Tokyo Ghoul 'S'", 'One Last Deal', 'The Fable', 'The Journalist', 'My First Client', 'The Divine Fury', 'Simmba', 'Cold Feet', "Doraemon: Nobita's…", 'EXIT', 'Juror 8', 'The Man Who Feels No Pain', 'Our Happy Holiday', 'The Invisible Witness']
ca_movie_urls = ["http://www.fantasy-sky.com/ContentList.aspx?section=002&category=0020{}".format(i) for i in range(1, 5)]
# Continue from here ...
print(ca_movie_titles)
['Downton Abbey', 'Ad Astra', 'Gemini Man', 'It Chapter Two', 'Maleficent: Mistress of Evil', 'The Current War', 'Zombieland: Double Tap', 'The Goldfinch', 'Fast & Furious Presents: Hobbs…', 'Angel Has Fallen', 'Hustlers', 'Luce', 'Official Secrets', 'Ready or Not', 'Disney’s The Lion King', 'Mayday Life', 'Fagara', 'We Are Champions', 'Ramen Shop', 'Guang', 'Chasing The Dragon II…', 'The Wandering Earth', 'Marriage Hunting Beauty', 'Paradise Next', 'Shiba-Park', "Tokyo Ghoul 'S'", 'One Last Deal', 'The Fable', 'The Journalist', 'My First Client', 'The Divine Fury', 'Simmba', 'Cold Feet', "Doraemon: Nobita's…", 'EXIT', 'Juror 8', 'The Man Who Feels No Pain', 'Our Happy Holiday', 'The Invisible Witness', 'DISNEY AND PIXAR’S Inside Out', 'Up', 'Toy Story 2', 'Toy Story 3', 'The Peanuts Movie', 'Shark Tale', 'The Lego Batman Movie', 'Toy Story', "Tim Burton's Corpse Bride", 'Smallfoot', 'Ice Age: Collision Course', 'Ferdinand', 'Railroad Tigers', 'So Young', 'A Simple Life', 'Beyond Beauty - Taiwan from Above', 'Dying To Survive', 'Infernal Affairs', 'Millennium Mambo', 'The Golden Era', 'Three Times', 'The Wedding Banquet', 'Cloud In The Wind', 'Fall in Love at First Kiss', 'Integrity', 'Still Human', 'Run for Dream', 'Love The Way You Are', 'More Than Blue', 'Stolen Identity', "Long Day's Journey Into Night", 'Shadow', 'Tracey', 'Masquerade Hotel', 'The Confidence Man JP: The Movie', 'Inseparable Bros', 'Cheer Boys!!', 'The White Storm 2 – Drug Lords', 'The 12th Man', 'Kingdom', "Jupiter's Moon", 'Simpel…', 'A Real Vermeer', 'My Hero Academia: Two Heroes', "Midsummer's Equation", 'Money', 'Gold', 'The Gangster, The Cop, The Devil', 'My Extraordinary Summer with Tess', 'The Lady Improper', 'Another World', 'Who You Think I Am', 'The Conductor', 'The Shiny Shrimps', 'All About Me', 'A Long Goodbye', 'Miss & Mrs Cops', 'Capernaum', 'The Disaster Artist', 'Black Swan', 'Crazy Heart', 'Moulin Rouge', 'The Devil Wears Prada', 'Walk the Line', 'The Hobbit: The Battle Of…', 'Disney’s Maleficent', 'Zombieland', "Bridget Jones's Baby", 'American Made', 'Home Again', 'Johnny English Reborn', 'Romeo + Juliet', 'The Great Wall', 'Spider-ManTM: Far From Home', 'Pokémon Detective Pikachu', 'Avengers: Infinity War', 'The Avengers', 'Shaft', 'Love Actually', 'Before I Fall', 'Godzilla', 'Superman Returns', 'Invictus', 'Chef', 'Spider-ManTM: Homecoming', 'Avengers: Age of Ultron', 'London Has Fallen', 'Never Let Me Go', 'John Wick', 'The Book of Henry', "A Dog's Purpose", 'The Lost City of Z', 'Love the Coopers', 'Runner Runner', 'The Intern', 'Café Society', 'Sherlock Holmes: A Game of Shadows', 'Deepwater Horizon', 'I, Daniel Blake', 'Captain America: Civil War', 'Iron Man', 'Iron Man 2', 'Iron Man 3', 'The Pianist', 'The Curious Case of Benjamin Button', 'Australia', 'The Tree Of Life', 'The Bucket List', 'The Legend of Tarzan', 'Furious 7', 'The Fate of the Furious', 'The Book Thief', 'Crazy, Stupid, Love.', 'The Holiday', 'The Mummy', 'Unstoppable', 'Straight Outta Compton', 'The Drop', 'The Judge', 'X-Men', 'X-Men: First Class', 'X-Men: Days of Future Past', 'X-Men: Apocalypse', 'Wrath of the Titans', 'Why Him?', 'The Shawshank Redemption', 'Wonder Woman']
print(ca_movie_titles[best_movie_index])
The Shawshank Redemption
requests
發送請求獲得回應.json()
方法後直接以 Python 資料結構解析.content
屬性後以 lxml
搭配 XPath 解析.text
屬性後以 bs4
搭配 CSS Selector 解析get_movie_data()
更方便的過程中我們做了幾個動作¶selenium
來自動化!¶# run in command line
(base) conda update conda
# run in command line
(base) conda install jupyter
# run in command line
(base) conda env list
# run in command line
(base) conda create --name <env_name> python=3.7
# run in command line
(base) conda activate <env_name>
# conda deactivate # 回到原本的 (base)
# run in command line
(env_name) conda install ipykernel requests lxml beautifulsoup4 selenium
# run in command line
(env_name) python -m ipykernel install --user --name <kernel_name> --display-name "Python Web Scraping"
# run in command line
(env_name) jupyter kernelspec list
用程式碼透過 ChromeDriver 操控 Chrome 瀏覽器前往 IMDB 首頁並將首頁的網址印出再關閉瀏覽器
from selenium import webdriver
driver_path = "c:/YOUR/PATH/TO/CHROMEDRIVER"
imdb_home = "https://www.imdb.com/"
driver = webdriver.Chrome(executable_path=driver_path) # Use Chrome
driver.get(imdb_home)
print(driver.current_url)
driver.close()
用程式碼透過 geckodriver 操控 Firefox 瀏覽器前往 IMDB 首頁並將首頁的網址印出再關閉瀏覽器
from selenium import webdriver
driver_path = "c:/YOUR/PATH/TO/GECKODRIVER"
imdb_home = "https://www.imdb.com/"
driver = webdriver.Firefox(executable_path=driver_path) # Use Firefox
driver.get(imdb_home)
print(driver.current_url)
driver.close()
driver
方法、屬性¶driver.get()
:前往指定網址driver.find_element_by_css_selector()
:定位搜尋欄位、搜尋按鈕與搜尋結果連結(單數)driver.find_elements_by_css_selector()
:定位搜尋欄位、搜尋按鈕與搜尋結果連結(複數)driver.find_element_by_xpath()
:定位搜尋欄位、搜尋按鈕與搜尋結果連結(單數)driver.find_elements_by_xpath()
:定位搜尋欄位、搜尋按鈕與搜尋結果連結(複數)driver.current_url
:取得當下瀏覽器的網址//
element
方法、屬性¶element.send_keys()
:輸入文字element.click()
:按下搜尋按鈕與連結element.text
:取出標記中的文字值element.get_attribute(ATTR)
:取出標記中的指定屬性selenium
實作 get_movie_data(movie_title)
¶get_movie_data("Avengers: Endgame (2019)")
avengers_movies = ["The Avengers (2012)", "Avengers: Age of Ultron (2015)", "Avengers: Infinity War (2018)", "Avengers: Endgame (2019)"]
print(avengers_movie_data)
import json
with open("avengers.json", "w") as f:
json.dump(avengers_movie_data, f)
ans()
{'22 April 2019': 1, '23 April 2019': 1, '24 April 2019': 33, '25 April 2019': 23, '26 April 2019': 14, '28 April 2019': 1, '29 April 2019': 1, '28 June 2019': 3, '29 June 2019': 1, '4 July 2019': 1, '12 July 2019': 2, '26 July 2019': 1, '2 September 2019': 1}