Browse all latest questions tagged Pandas
I have a pandas DataFrame containing one column with multiple JSON data items as list of dicts. I want to normalize the JSON column and duplicate the non-JSON columns: # creating dataframe df_actions...
This is more of a conceptual question, I do not have a specific problem. I am learning python for data analysis, but I am very familiar with R - one of the great things about R is plyr (and of course...
Assuming I have a dataframe similar to the below, how would I get the correlation between 2 specific columns and then group by the 'ID' column? I believe the Pandas 'corr' method finds the correlatio...
Consider a dataframe with 2 columns for easiness. The first column is id and it is the key. The second column, named code is not a key but the case of two entries having the same value is very rare....
I have two data frames, df1 and df2. Now, df1 contains 6 records and df2 contains 4 records. I want to get the unmatched records out of it. I tried it but getting an error ValueError: Can only compare...
I want to write a text file which is separated by four spaces instead of one tab: df.to_csv(file,sep= '\s\s\s\s') instead of df.to_csv(file,sep= '\t') I tried regex : df.to_csv(file,sep= r'\s...
Formerly I had anaconda with pandas 0.18. Using the code below, I made a calculation by the function "calc_func" and assign the result to the the columns of DataFrame, say "A" and "B". df[["A", "B"]]...
I have a DataFrame that looks like the following: X Y Date are_equal 0 50.0 10.0 2018-08-19 False 1 NaN 10.0 2018-08-19 False 2 NaN 50.0 2018-08-1...
To create a pandas dataframe from numpy I can use : columns = ['1','2'] data = np.array([[1,2] , [1,5] , [2,3]]) df_1 = pd.DataFrame(data,columns=columns) df_1 If I instead use : columns = ['1',...
My question is simple, I have a dataframe and I groupby the results based on a column and get the size like this: df.groupby('column').size() Now the problem is that I only want the ones where size...
Let us consider the following pandas dataframe: df = pd.DataFrame([[1,np.array([6,7])],[4,np.array([8,9])]], columns = {'A','B'}) where the B column is composed by two numpy arrays. If we save t...
This is my dataframe: date ids 0 2011-04-23 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... 1 2011-04-24 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...
In the dataframe below, I would like to eliminate the duplicate cid values so the output from df.groupby('date').cid.size() matches the output from df.groupby('date').cid.nunique(). I have looked at...
I have a dictionary which is dict['TimeStamp'] = [value1,value2,value3] the dict has many times stamps and each time stamp has 3 values for example I want to make panda dataframe of all values of dict...
I have the following data frame: import pandas as pd df = pd.DataFrame({ 'gene':["foo", "bar // lal", "qux", "woz"]...