Notice
Recent Posts
Recent Comments
Link
«   2024/04   »
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30
Archives
Today
Total
관리 메뉴

차근차근

우한 코로나 바이러스 데이터 분석 본문

대학교/Data

우한 코로나 바이러스 데이터 분석

SWKo 2020. 2. 1. 02:03
고상원_소프트웨어공학부
In [2]:
from IPython.core.display import display, HTML
display(HTML("<style> .container{width:90% !important;}</style>"))
In [46]:
# 쓸모 없는 경고 메시지 숨기기
import warnings

# 경고 메시지가 뜨지 않도록 코드 입력
warnings.filterwarnings('ignore')
In [47]:
# 필요한 모듈 설치
In [48]:
# 1. 데이터 분석에 필요한 기본 모듈
# 2. 시각화 모듈
# 3. 데이터 수집 모듈
# 4. 자연어 처리 모듈
# 5. 워드 클라우드 모듈
In [49]:
!pip3 install numpy
!pip3 install pandas
!pip3 install lxml

!pip3 install matplotlib
!pip3 install seaborn

!pip install folium
!pip install geopandas

!pip install pycountry

!pip install pillow
!pip install konlpy
!pip install wordcloud
Requirement already satisfied: numpy in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (1.18.1)
Requirement already satisfied: pandas in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (0.25.3)
Requirement already satisfied: python-dateutil>=2.6.1 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from pandas) (2.8.1)
Requirement already satisfied: numpy>=1.13.3 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from pandas) (1.18.1)
Requirement already satisfied: pytz>=2017.2 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from pandas) (2019.3)
Requirement already satisfied: six>=1.5 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from python-dateutil>=2.6.1->pandas) (1.14.0)
Requirement already satisfied: lxml in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (4.5.0)
Requirement already satisfied: matplotlib in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (3.1.2)
Requirement already satisfied: cycler>=0.10 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from matplotlib) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from matplotlib) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from matplotlib) (2.4.6)
Requirement already satisfied: python-dateutil>=2.1 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from matplotlib) (2.8.1)
Requirement already satisfied: numpy>=1.11 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from matplotlib) (1.18.1)
Requirement already satisfied: six in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from cycler>=0.10->matplotlib) (1.14.0)
Requirement already satisfied: setuptools in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from kiwisolver>=1.0.1->matplotlib) (41.2.0)
Requirement already satisfied: seaborn in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (0.10.0)
Requirement already satisfied: matplotlib>=2.1.2 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from seaborn) (3.1.2)
Requirement already satisfied: scipy>=1.0.1 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from seaborn) (1.4.1)
Requirement already satisfied: pandas>=0.22.0 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from seaborn) (0.25.3)
Requirement already satisfied: numpy>=1.13.3 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from seaborn) (1.18.1)
Requirement already satisfied: python-dateutil>=2.1 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from matplotlib>=2.1.2->seaborn) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from matplotlib>=2.1.2->seaborn) (2.4.6)
Requirement already satisfied: kiwisolver>=1.0.1 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from matplotlib>=2.1.2->seaborn) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from matplotlib>=2.1.2->seaborn) (0.10.0)
Requirement already satisfied: pytz>=2017.2 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from pandas>=0.22.0->seaborn) (2019.3)
Requirement already satisfied: six>=1.5 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from python-dateutil>=2.1->matplotlib>=2.1.2->seaborn) (1.14.0)
Requirement already satisfied: setuptools in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from kiwisolver>=1.0.1->matplotlib>=2.1.2->seaborn) (41.2.0)
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Requirement already satisfied: folium in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (0.10.1)
Requirement already satisfied: requests in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from folium) (2.22.0)
Requirement already satisfied: branca>=0.3.0 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from folium) (0.3.1)
Requirement already satisfied: numpy in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from folium) (1.18.1)
Requirement already satisfied: jinja2>=2.9 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from folium) (2.10.3)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from requests->folium) (1.25.8)
Requirement already satisfied: idna<2.9,>=2.5 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from requests->folium) (2.8)
Requirement already satisfied: certifi>=2017.4.17 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from requests->folium) (2019.11.28)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from requests->folium) (3.0.4)
Requirement already satisfied: six in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from branca>=0.3.0->folium) (1.14.0)
Requirement already satisfied: MarkupSafe>=0.23 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from jinja2>=2.9->folium) (1.1.1)
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Requirement already satisfied: geopandas in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (0.6.2)
Requirement already satisfied: pandas>=0.23.0 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from geopandas) (0.25.3)
Requirement already satisfied: fiona in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from geopandas) (1.8.13)
Requirement already satisfied: shapely in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from geopandas) (1.7.0)
Requirement already satisfied: pyproj in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from geopandas) (2.4.2.post1)
Requirement already satisfied: numpy>=1.13.3 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from pandas>=0.23.0->geopandas) (1.18.1)
Requirement already satisfied: python-dateutil>=2.6.1 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from pandas>=0.23.0->geopandas) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from pandas>=0.23.0->geopandas) (2019.3)
Requirement already satisfied: click<8,>=4.0 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from fiona->geopandas) (7.0)
Requirement already satisfied: munch in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from fiona->geopandas) (2.5.0)
Requirement already satisfied: six>=1.7 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from fiona->geopandas) (1.14.0)
Requirement already satisfied: click-plugins>=1.0 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from fiona->geopandas) (1.1.1)
Requirement already satisfied: cligj>=0.5 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from fiona->geopandas) (0.5.0)
Requirement already satisfied: attrs>=17 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from fiona->geopandas) (19.3.0)
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Requirement already satisfied: pycountry in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (19.8.18)
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Requirement already satisfied: pillow in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (7.0.0)
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Requirement already satisfied: konlpy in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (0.5.2)
Requirement already satisfied: tweepy>=3.7.0 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from konlpy) (3.8.0)
Requirement already satisfied: numpy>=1.6 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from konlpy) (1.18.1)
Requirement already satisfied: JPype1>=0.7.0 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from konlpy) (0.7.1)
Requirement already satisfied: beautifulsoup4==4.6.0 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from konlpy) (4.6.0)
Requirement already satisfied: colorama in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from konlpy) (0.4.3)
Requirement already satisfied: lxml>=4.1.0 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from konlpy) (4.5.0)
Requirement already satisfied: six>=1.10.0 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from tweepy>=3.7.0->konlpy) (1.14.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from tweepy>=3.7.0->konlpy) (1.3.0)
Requirement already satisfied: requests>=2.11.1 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from tweepy>=3.7.0->konlpy) (2.22.0)
Requirement already satisfied: PySocks>=1.5.7 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from tweepy>=3.7.0->konlpy) (1.7.1)
Requirement already satisfied: oauthlib>=3.0.0 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->tweepy>=3.7.0->konlpy) (3.1.0)
Requirement already satisfied: idna<2.9,>=2.5 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from requests>=2.11.1->tweepy>=3.7.0->konlpy) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from requests>=2.11.1->tweepy>=3.7.0->konlpy) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from requests>=2.11.1->tweepy>=3.7.0->konlpy) (2019.11.28)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from requests>=2.11.1->tweepy>=3.7.0->konlpy) (1.25.8)
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Requirement already satisfied: wordcloud in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (1.6.0)
Requirement already satisfied: matplotlib in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from wordcloud) (3.1.1)
Requirement already satisfied: numpy>=1.6.1 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from wordcloud) (1.18.1)
Requirement already satisfied: pillow in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from wordcloud) (7.0.0)
Requirement already satisfied: cycler>=0.10 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from matplotlib->wordcloud) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from matplotlib->wordcloud) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from matplotlib->wordcloud) (2.4.6)
Requirement already satisfied: python-dateutil>=2.1 in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from matplotlib->wordcloud) (2.8.1)
Requirement already satisfied: six in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from cycler>=0.10->matplotlib->wordcloud) (1.14.0)
Requirement already satisfied: setuptools in /Users/kosangwon/.conda/envs/practice2/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib->wordcloud) (45.1.0.post20200127)
In [50]:
!pip3 install html5lib
Requirement already satisfied: html5lib in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (1.0.1)
Requirement already satisfied: six>=1.9 in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from html5lib) (1.14.0)
Requirement already satisfied: webencodings in /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages (from html5lib) (0.5.1)
In [51]:
# 필수 라이브러리를 임포트 하세요
# numpy, pandas, matplotlib
import numpy as np
import pandas as pd
import matplotlib, matplotlib.pyplot as plt

# 데이터 크롤링을 위한 기본 모듈
import requests
from bs4 import BeautifulSoup

# 자연어 처리
from konlpy.tag import Hannanum, Twitter
# json 형태의 데이터 처리
import json

# 좌표 데이터를 표시
import geopandas as gpd
# 국가 정보를 얻기 위해
import pycountry
# 이미지를 다루기 위해서
from PIL import Image
# 지도 표시
import folium
# 워드클라우드 만들기 위해
from wordcloud import WordCloud

우한 폐렴(신종 코로나 바이러스) 통계 자료 구하기

In [66]:
# 데이터를 가져올 url을 설정 합니다.
data_url = "https://en.wikipedia.org/wiki/2019–20_Wuhan_coronavirus_outbreak_by_country_and_territory"
req = requests.get(data_url)

# 해당 url에서 데이터를 읽어 옵니다.
# read_html에 매개변수로 원하는 페이지 주소를 넣는다.
data = pd.read_html(req.text, encoding='utf-8')
countries = data[1]
In [53]:
countries.head()
Out[53]:
Country/Region Confirmedcases Deaths References Unnamed: 4
0 China (mainland) 9703 213 [1][2][3][4][5] NaN
1 Japan 14 0 [6][7][8] NaN
2 Thailand 14 0 [9][10] NaN
3 Singapore 13 0 [11][12][13] NaN
4 Hong Kong 12 0 [14][1] NaN

데이터 전처리

In [67]:
# 컬럼명 변경
# 컬럼명은 기존 컬럼명 : 바꿀 컬럼명을 딕셔너리 형태로 입력합니다.
# Country/Region -> country
countries.rename(columns={'Country/Region':'country'}, inplace=True)
In [68]:
countries.head()
Out[68]:
country Confirmedcases Deaths References Unnamed: 4
0 China (mainland) 9703 213 [1][2][3][4][5] NaN
1 Japan 14 0 [6][7][8] NaN
2 Thailand 14 0 [9][10] NaN
3 Singapore 13 0 [11][12][13] NaN
4 Hong Kong 12 0 [14][1] NaN
In [69]:
# 불필요한 열 지우기
# 열 지우기는 drop으로 합니다.
# 열 이름을 선택합니다.
# df.drop(열이름, 축번호, 추가 옵션)
# References, Unnamed: 4

countries.drop('References', axis=1, inplace=True)
countries.drop('Unnamed: 4', axis=1, inplace=True)
In [ ]:
 
In [70]:
countries.head()
Out[70]:
country Confirmedcases Deaths
0 China (mainland) 9703 213
1 Japan 14 0
2 Thailand 14 0
3 Singapore 13 0
4 Hong Kong 12 0
In [36]:
countries.head()
Out[36]:
country Confirmedcases Deaths Unnamed: 4
0 China (mainland) 9703 213 NaN
1 Japan 14 0 NaN
2 Thailand 14 0 NaN
3 Singapore 13 0 NaN
4 Hong Kong 12 0 NaN
In [58]:
test_list = ['a','b','c','d']
In [ ]:
 
In [59]:
# 숫자 하나만 쓰는 것 indexing
test_list[-2]
# slicing : 범위 선택
test_list[1:]
Out[59]:
['b', 'c', 'd']
In [60]:
countries.index[-2:]
Out[60]:
RangeIndex(start=23, stop=25, step=1)
In [1]:
# 불필요한 열 지우기
# 열 지우기는 drop으로 합니다.
# 열 이름을 선택합니다.
# df.drop(열이름, 축번호, 추가 옵션)
# References, Unnamed: 4

#countries.drop('References', axis=1, inplace=True)
In [72]:
# 불필요한 행 지우기
# df.drop(행선택, 추가 옵션)
countries.drop(countries.index[-2:], inplace=True)
In [73]:
countries
Out[73]:
country Confirmedcases Deaths
0 China (mainland) 9703 213
1 Japan 14 0
2 Thailand 14 0
3 Singapore 13 0
4 Hong Kong 12 0
5 Australia 9 0
6 Taiwan 9 0
7 Malaysia 8 0
8 Macau 7 0
9 South Korea 7 0
10 France 6 0
11 United States 6 0
12 Germany 5 0
13 Vietnam 5 0
14 United Arab Emirates 4 0
15 Canada 3 0
16 Italy 2 0
17 Cambodia 1 0
18 Finland 1 0
19 India 1 0
20 Nepal 1 0
21 Philippines 1 0
22 Sri Lanka 1 0
In [74]:
countries.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 23 entries, 0 to 22
Data columns (total 3 columns):
country           23 non-null object
Confirmedcases    23 non-null object
Deaths            23 non-null object
dtypes: object(3)
memory usage: 736.0+ bytes
In [75]:
# 데이터의 형변환
#countries['Confirmedcases']
countries.Confirmedcases = countries.Confirmedcases.astype('int')
In [76]:
countries.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 23 entries, 0 to 22
Data columns (total 3 columns):
country           23 non-null object
Confirmedcases    23 non-null int64
Deaths            23 non-null object
dtypes: int64(1), object(2)
memory usage: 736.0+ bytes
In [77]:
countries.Deaths = countries.Deaths.astype('int')
In [78]:
countries.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 23 entries, 0 to 22
Data columns (total 3 columns):
country           23 non-null object
Confirmedcases    23 non-null int64
Deaths            23 non-null int64
dtypes: int64(2), object(1)
memory usage: 736.0+ bytes
In [79]:
countries['country']
Out[79]:
0         China (mainland)
1                    Japan
2                 Thailand
3                Singapore
4                Hong Kong
5                Australia
6                   Taiwan
7                 Malaysia
8                    Macau
9              South Korea
10                  France
11           United States
12                 Germany
13                 Vietnam
14    United Arab Emirates
15                  Canada
16                   Italy
17                Cambodia
18                 Finland
19                   India
20                   Nepal
21             Philippines
22               Sri Lanka
Name: country, dtype: object
In [ ]:
# 국가 코드를 찾아서 같이 저장
# 중국은 -> China
# 한국은 -> Korea, Republic of
In [80]:
# 국가명 변경(다음에 사용할 데이터의 국가명이 아래와 같이 되어있기 때문)
# 중국은 -> China
# 한국은 -> Korea, Republic of
countries.loc[countries['country']=="China (mainland)",'country'] = "China"
countries.loc[countries['country']=="South Korea",'country'] = "Korea, Republic of"
In [81]:
countries['country']
Out[81]:
0                    China
1                    Japan
2                 Thailand
3                Singapore
4                Hong Kong
5                Australia
6                   Taiwan
7                 Malaysia
8                    Macau
9       Korea, Republic of
10                  France
11           United States
12                 Germany
13                 Vietnam
14    United Arab Emirates
15                  Canada
16                   Italy
17                Cambodia
18                 Finland
19                   India
20                   Nepal
21             Philippines
22               Sri Lanka
Name: country, dtype: object
In [82]:
# index 변경하기 country 열이 index가 되도록 설정
# df.set_index(열제목)
countries.set_index('country', inplace=True)
In [83]:
countries
Out[83]:
Confirmedcases Deaths
country
China 9703 213
Japan 14 0
Thailand 14 0
Singapore 13 0
Hong Kong 12 0
Australia 9 0
Taiwan 9 0
Malaysia 8 0
Macau 7 0
Korea, Republic of 7 0
France 6 0
United States 6 0
Germany 5 0
Vietnam 5 0
United Arab Emirates 4 0
Canada 3 0
Italy 2 0
Cambodia 1 0
Finland 1 0
India 1 0
Nepal 1 0
Philippines 1 0
Sri Lanka 1 0
In [86]:
# 국가가 아닌 데이터 합치기
# Macau의 확진자 수를 China의 확진자 수에 더하세요.
# China의 확진자 수 구하기
# Macau의 확진자 수 구하기
countries['Confirmedcases']['China'] += countries['Confirmedcases']['Macau']
In [87]:
# Macau의 사망자 수를 China의 사망자 수에 더하세요.
countries['Deaths']['China'] += countries['Deaths']['Macau']
In [88]:
countries
Out[88]:
Confirmedcases Deaths
country
China 9710 213
Japan 14 0
Thailand 14 0
Singapore 13 0
Hong Kong 12 0
Australia 9 0
Taiwan 9 0
Malaysia 8 0
Macau 7 0
Korea, Republic of 7 0
France 6 0
United States 6 0
Germany 5 0
Vietnam 5 0
United Arab Emirates 4 0
Canada 3 0
Italy 2 0
Cambodia 1 0
Finland 1 0
India 1 0
Nepal 1 0
Philippines 1 0
Sri Lanka 1 0
In [89]:
# 불필요한 데이터 제거
# Macau 행을 지우세요.
countries.drop('Macau', inplace=True)
In [90]:
countries
Out[90]:
Confirmedcases Deaths
country
China 9710 213
Japan 14 0
Thailand 14 0
Singapore 13 0
Hong Kong 12 0
Australia 9 0
Taiwan 9 0
Malaysia 8 0
Korea, Republic of 7 0
France 6 0
United States 6 0
Germany 5 0
Vietnam 5 0
United Arab Emirates 4 0
Canada 3 0
Italy 2 0
Cambodia 1 0
Finland 1 0
India 1 0
Nepal 1 0
Philippines 1 0
Sri Lanka 1 0
In [93]:
pycountry.countries.search_fuzzy("China")[0].alpha_3
Out[93]:
'CHN'
In [106]:
# 지도에 표시하기 위해 국가 코드 매칭하기
# 국가 코드를 찾아서 code열 만들기
countries["code"] = countries.index.map(lambda x:pycountry.countries.search_fuzzy(x)[0].alpha_3)
In [107]:
countries
Out[107]:
Confirmedcases Deaths code
country
China 9710 213 CHN
Japan 14 0 JPN
Thailand 14 0 THA
Singapore 13 0 SGP
Hong Kong 12 0 HKG
Australia 9 0 AUS
Taiwan 9 0 TWN
Malaysia 8 0 MYS
Korea, Republic of 7 0 KOR
France 6 0 FRA
United States 6 0 USA
Germany 5 0 DEU
Vietnam 5 0 VNM
United Arab Emirates 4 0 ARE
Canada 3 0 CAN
Italy 2 0 ITA
Cambodia 1 0 KHM
Finland 1 0 FIN
India 1 0 IND
Nepal 1 0 NPL
Philippines 1 0 PHL
Sri Lanka 1 0 LKA
In [ ]:
# map -> 여러 데이터가 들어있는 변수에서 하나씩 꺼내서 처리를 마치고
# 원래의 데이터 형으로 돌려주는 기능
# map함수는 Series의 값(value)을 하나씩 꺼내서 lambda 함수의 인자로 넘기는 커스텀 함수를 각 value별로 실행시키는 것이다. 
In [94]:
# 람다식이란?
# 무명 함수 : 이름이 없는 함수 -> 임시 함수
lambda x:pycountry.countries.search_fuzzy(x)[0].alpha_3
Out[94]:
<function __main__.<lambda>(x)>
In [96]:
# 좌표 데이터 불러오기
geo_data = json.load(open("./data/world-countries.json"))
In [98]:
# 국가별 폴리곤 데이터 불러오기
shapefile = './data/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']
gdf.head()
Out[98]:
country country_code geometry
0 Fiji FJI MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 United Republic of Tanzania TZA POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 Western Sahara SAH POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 Canada CAN MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 United States of America USA MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
In [111]:
# 지도 표시용 모듈 folium
map = folium.Map(location=[20, 3], zoom_start=2, tiles='stamenwatercolor')

# 생존 사망자 표시하기
map.choropleth(geo_data=geo_data, data=countries,
             columns=['code', 'Confirmedcases'],
             key_on='feature.id',
             name='감염자',
             fill_color='PuRd', fill_opacity=0.7, line_opacity=0.2,
              legend_name="감염자", nan_fill_color="#9bff4d")

# 국가 표시하기
folium.GeoJson(data=gdf,
               name='country',smooth_factor=2,
               style_function=lambda x: {'color':'black','fillColor':'transparent','weight':2},
                tooltip=folium.GeoJsonTooltip(fields=['country'],
                                              labels=False,
                                              sticky=False),
               highlight_function=lambda x: {'weight':3,'fillColor':'grey','opacity':0.1}
              ).add_to(map)
Out[111]:
<folium.features.GeoJson at 0x1280cec10>
In [115]:
# 사망자를 표시하세요.
map.choropleth(geo_data=geo_data, data=countries,
             columns=['code', 'Deaths'],
             key_on='feature.id',
             name='사망자',
             fill_color='PuRd', fill_opacity=0.7, line_opacity=0.2,
              legend_name="사망자", nan_fill_color="#9bff4d")

# 국가 표시하기
folium.GeoJson(data=gdf,
               name='country',smooth_factor=2,
               style_function=lambda x: {'color':'black','fillColor':'transparent','weight':2},
                tooltip=folium.GeoJsonTooltip(fields=['country'],
                                              labels=False,
                                              sticky=False),
               highlight_function=lambda x: {'weight':3,'fillColor':'grey','opacity':0.1}
              ).add_to(map)
Out[115]:
<folium.features.GeoJson at 0x128018710>
In [113]:
# 국가 표시하기
folium.GeoJson(data=gdf,
               name='country',smooth_factor=2,
               style_function=lambda x: {'color':'black','fillColor':'transparent','weight':2},
                tooltip=folium.GeoJsonTooltip(fields=['country'],
                                              labels=False,
                                              sticky=False),
               highlight_function=lambda x: {'weight':3,'fillColor':'grey','opacity':0.1}
              ).add_to(map)
Out[113]:
<folium.features.GeoJson at 0x12802bfd0>
In [118]:
# 레이어 컨트롤러 추가하기
folium.LayerControl().add_to(map)
Out[118]:
<folium.map.LayerControl at 0x1255e5c90>
In [110]:
# 지도 보여주기
In [ ]:
map # 실행시키면 지도 뜸
In [ ]:
# 기사 수집하기
In [ ]:
 
In [ ]:
# 댓글 수집하기
In [ ]:
 
In [120]:
# 우한 폐렴에 관련된 글 분석하기
han = Hannanum()
text = han.nouns("이번 위한 폐렴 바이러스는 위험합니다.") # 명사들만 뽑아줌
In [121]:
text
Out[121]:
['이번', '폐렴', '바이러스', '위험']
In [122]:
# 텍스트 분석 결과 하나로 합치기
text = " ".join(text)
In [123]:
text
Out[123]:
'이번 폐렴 바이러스 위험'
In [124]:
mask = np.array(Image.open('./data/back.png'))
In [125]:
mask
Out[125]:
array([[255, 253, 237, ..., 255, 255, 255],
       [255, 253, 237, ..., 255, 255, 255],
       [255, 253, 236, ..., 255, 255, 255],
       ...,
       [255, 253, 237, ..., 255, 255, 255],
       [255, 253, 237, ..., 255, 255, 255],
       [255, 253, 237, ..., 255, 255, 255]], dtype=uint8)
In [ ]:
# 폰트 경로 찾기 찾은다음 밑에 넣을 것 아직 안됨.
from matplotlib import font_manager
[(f.name, f.fname) for f in font_manager]
In [ ]:
# 폰트 목록 불러오기
In [ ]:
 
In [ ]:
# 폰트 찾아서 설정하기
In [ ]:
wordcloud = WordCloud(font_path='폰트파일 경로 넣기',
                      max_font_size=100,background_color='white',
                     mask=mask).generate(text)

fig = plt.figure()
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.savefig('cloud.png')
In [ ]:
wordcloud.words_
In [ ]:
 
In [ ]:
 
In [ ]:
# 그래프 폰트 한글로 설정
plt.rc('font', family='AppleGothic')
In [ ]:
 

'대학교 > Data' 카테고리의 다른 글

conda  (0) 2020.02.02
우한 코로나 바이러스 데이터 분석 (지도 출력)  (0) 2020.02.01
selenium  (0) 2020.02.01
matplotlib  (0) 2020.02.01
Kaggle [Titanic Data Analysis]  (0) 2020.02.01
Comments