Lucas chang
5 min readDec 12, 2018

--

Python learning notes

(https://medium.com/p/b29d48d18fbb#e519)

Compare pivot between melt

Encoding problem in python

Condition “while” in python

Using range function to do a loop

Dict in python

Format string

with

ufuncs

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

With

use timing : 用於對資源進行訪問的場合,不管如何都會進行清理操作釋放資源,像文件使用後關閉或釋放等。

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

ufuncs :

the whole name is universal function , is a special function based on numpy.

you can do anything to the elements in your list and is very similar to apply but more efficient.

there are some useful ufuncs functions: diff, .shift, .cumsum, .cumcount……

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Format string

If you want to print a string and the string contains some variable, you can use the format string.

for example:

my_name = 'Zed A. Shaw'
print("This is %s." %my_name)
the result will be : This is Zed A. Shaw.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Text mining skills

  1. TF-IDF is a way of modeling the document to unorder collection of words.
  2. Vector space is to make each document to a vector.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Dict in python

dict is a key-value format in python. It is index with key.

#declare a dict 
tel = {'jack': 4098, 'sape': 4139}
#add element to dict
tel['guido'] = 4127
#delete element to dict
del tel['sape']
#list the key in the dict
list(tel)
sorted(tel)#will sort the key
#create dictionaries from arbitrary key and value expressions
{x: x**2 for x in (2, 4, 6)}
#corpus is a dict
print(corpus)
print("====================")#use item to print the dict with list(print including key and value)
print(corpus.items())
print("====================")#return the value in the dict with list
print(corpus.values())
print("====================")#use sorted to return the key in the dict
print(sorted(corpus))

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Using range function to do a loop

When we want to make a range of number, we will use the range function. There are 3 parameters in this function.

  1. start number : In the below example, I set 100 to be the start number.
  2. stop number : 0 be the stop number.
  3. step : -1 be the direction of the range.
#use range in a loop 
for i in range(100, 0, -1):
print(i)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Condition “while” in python

use while condition can do something several times until you meet the condition. And there are two commands that you can use with it, continue and break. example1 will show the usage of break. example2 shows the usage of continue.

#example 1:break-escape the loop when we meet the condition
#print until i is higher than 20
i = 1
while i <20:
i = i+1
if i > 20:
break
print(i)
print(i)#example2:continue-if the condition meet we escape this value and continue with next value

#print the even that is lower than 10
i = 1
while i < 10:
i += 1
if i%2 > 0:
continue
print (i) # 输出双数2、4、6、8、10

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Compare pivot between melt

import pandas as pd
cheese = pd.DataFrame({'first' : ['John', 'Mary'],
'last' : ['Doe', 'Bo'],
'height' : [5.5, 6.0],
'weight' : [130, 150]})
print(cheese)
print("****************************")
pivot_cheese = cheese.pivot(index = "first",columns = "last",values = 'height')
print(pivot_cheese)
melt_cheese = cheese.melt(id_vars=['first', 'last'])
print("****************************")
print(melt_cheese)
cheese vs pivot_cheese vs melt_cheese

I use the pivot to make the original cheese dataframe wider, there are 3 parameters in it. index controls the id of the new table, columns controls which column’s value will be the new columns, values controls which columns provide the value in the new table.

For comparison, I use melt to the cheese dataframe and the melt_cheese is more longer than the cheese dataframe. I set the parameter id_vars in this function. It did the difference thing than pivot, melt make the variables in cheese(height and weight) be the new column value in the melt_cheese.

=======================================

Encoding problem in python

When I run the below web scraping code in python and I get the UnicodeEncodeError. The reason is that the default unicode in cmd is cp950 and I have to turn the unicode to utf-8 to fix this problem(the fine code block).

#####problem code 
def store(data):
with open(fileName, 'a') as f: f.write(data.encode(sys.stdin.encoding,"replace").decode(sys.stdin.encoding))
UnicodeEncodeError: 'cp950' codec can't encode character '\u2661' in position 46: illegal multibyte sequence#####fine code
def store(data):
with open(fileName, 'a') as f:
f.write(data.encode("utf-8").decode("cp950","ignore"))

Reference:

https://coder.tw/?p=7487
https://www.ptt.cc/bbs/Python/M.1380034106.A.553.html

--

--

Lucas chang

graduate from applied statistic in Taiwan Good at Machine Learning, Text mining, Deep Learning, Data Analysis....