Python learning notes
(https://medium.com/p/b29d48d18fbb#e519)
Using range function to do a loop
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
With
use timing : 用於對資源進行訪問的場合,不管如何都會進行清理操作釋放資源,像文件使用後關閉或釋放等。
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
ufuncs :
the whole name is universal function , is a special function based on numpy.
you can do anything to the elements in your list and is very similar to apply but more efficient.
there are some useful ufuncs functions: diff, .shift, .cumsum, .cumcount……
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Format string
If you want to print a string and the string contains some variable, you can use the format string.
for example:
my_name = 'Zed A. Shaw'
print("This is %s." %my_name)the result will be : This is Zed A. Shaw.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Text mining skills
- TF-IDF is a way of modeling the document to unorder collection of words.
- Vector space is to make each document to a vector.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Dict in python
dict is a key-value format in python. It is index with key.
#declare a dict
tel = {'jack': 4098, 'sape': 4139}#add element to dict
tel['guido'] = 4127#delete element to dict
del tel['sape']#list the key in the dict
list(tel)
sorted(tel)#will sort the key#create dictionaries from arbitrary key and value expressions
{x: x**2 for x in (2, 4, 6)}
#corpus is a dict
print(corpus)print("====================")#use item to print the dict with list(print including key and value)
print(corpus.items())print("====================")#return the value in the dict with list
print(corpus.values())print("====================")#use sorted to return the key in the dict
print(sorted(corpus))
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Using range function to do a loop
When we want to make a range of number, we will use the range function. There are 3 parameters in this function.
- start number : In the below example, I set 100 to be the start number.
- stop number : 0 be the stop number.
- step : -1 be the direction of the range.
#use range in a loop
for i in range(100, 0, -1):
print(i)
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Condition “while” in python
use while condition can do something several times until you meet the condition. And there are two commands that you can use with it, continue and break. example1 will show the usage of break. example2 shows the usage of continue.
#example 1:break-escape the loop when we meet the condition
#print until i is higher than 20i = 1
while i <20:
i = i+1
if i > 20:
break
print(i)print(i)#example2:continue-if the condition meet we escape this value and continue with next value
#print the even that is lower than 10
i = 1
while i < 10:
i += 1
if i%2 > 0:
continue
print (i) # 输出双数2、4、6、8、10
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Compare pivot between melt
import pandas as pd
cheese = pd.DataFrame({'first' : ['John', 'Mary'],
'last' : ['Doe', 'Bo'],
'height' : [5.5, 6.0],
'weight' : [130, 150]})
print(cheese)print("****************************")
pivot_cheese = cheese.pivot(index = "first",columns = "last",values = 'height')
print(pivot_cheese)melt_cheese = cheese.melt(id_vars=['first', 'last'])
print("****************************")print(melt_cheese)
I use the pivot to make the original cheese dataframe wider, there are 3 parameters in it. index controls the id of the new table, columns controls which column’s value will be the new columns, values controls which columns provide the value in the new table.
For comparison, I use melt to the cheese dataframe and the melt_cheese is more longer than the cheese dataframe. I set the parameter id_vars in this function. It did the difference thing than pivot, melt make the variables in cheese(height and weight) be the new column value in the melt_cheese.
=======================================
Encoding problem in python
When I run the below web scraping code in python and I get the UnicodeEncodeError. The reason is that the default unicode in cmd is cp950 and I have to turn the unicode to utf-8 to fix this problem(the fine code block).
#####problem code
def store(data):
with open(fileName, 'a') as f: f.write(data.encode(sys.stdin.encoding,"replace").decode(sys.stdin.encoding))UnicodeEncodeError: 'cp950' codec can't encode character '\u2661' in position 46: illegal multibyte sequence#####fine code
def store(data):
with open(fileName, 'a') as f:
f.write(data.encode("utf-8").decode("cp950","ignore"))
Reference:
https://coder.tw/?p=7487
https://www.ptt.cc/bbs/Python/M.1380034106.A.553.html