Sometimes there is need to modify order or columns data in CSV-formatted datalogs, for example if charting software accepts only specific formatting.
For small datasets this can be done manually, but there is quicker way to do this job if we use Python environment.
Example dataset and required operations
batt,vref,hour,min,sec,day,month 12.7171572,7.1304967,0,42,49,12,3 12.7172381,7.1304952,0,43,5,12,3 12.7172647,7.1304943,0,43,10,12,3 12.7172929,7.1304944,0,43,14,12,3 12.7173273,7.1304965,0,43,19,12,3
Python program
# xDevs.com Python CSV parser # (i) Tsemenko Illya # (i) 2008-2015 import csv # Open CSV file with input data (in this case input.csv) with open('input.csv','rb') as i: # Create new file, which will have parsed output with open('output.csv','wb') as o: # Read data from test2.csv file r = csv.DictReader(i) # Prepare header for new file print ('Year,Hour,min,sec,batt,vref\n') # Output header to screen o.write ('Year,Hour,min,sec,batt,vref\n') # Output header to file # Create row counter variable, and null it. rowcount = 0 for row in r: # Every new line will increment counter/index rowcount +=1 print ('Date(2015,%s,%s,%s,%s,%s),%s,%s,%d'% (row['month'],row['day'],row['hour'],row['min'],row['sec'],row['batt'],row['vref'],rowcount)) #Output to screen o.write('Date(2015,%s,%s,%s,%s,%s),%s,%s,%d\n'% (row['month'],row['day'],row['hour'],row['min'],row['sec'],row['batt'],row['vref'],rowcount)) #Output to file
Python-source file available to download in references links at right side.
Result
Date(2015,3,12,0,42,49),12.7171572,7.1304967,1 Date(2015,3,12,0,43,5),12.7172381,7.1304952,2 Date(2015,3,12,0,43,10),12.7172647,7.1304943,3 Date(2015,3,12,0,43,14),12.7172929,7.1304944,4 Date(2015,3,12,0,43,19),12.7173273,7.1304965,5 Date(2015,3,12,0,43,23),12.7173481,7.1304951,6
More complicated example with space/floating to int trunication
Input data file
batt|vref|hour|min|sec|day|month 12.1337766 | 7.1367097 | 23.0000000 | 40.0000000 | 32.0000000 | 9.0000000 | 3.0000000 | 12.1337737 | 7.1367085 | 23.0000000 | 40.0000000 | 48.0000000 | 9.0000000 | 3.0000000 | 12.1337754 | 7.1367081 | 23.0000000 | 40.0000000 | 53.0000000 | 9.0000000 | 3.0000000 | 12.1337750 | 7.1367083 | 23.0000000 | 40.0000000 | 57.0000000 | 9.0000000 | 3.0000000 |
Python source code
# xDevs.com Python CSV parser # (i) Tsemenko Illya # (i) 2008-2015 import csv # Open CSV file with input data (in this case - input.csv) with open('input.csv','rb') as i: # Create new file, which will have parsed output with open('output.csv','wb') as o: # Read data from test2.csv file r = csv.DictReader(i,delimiter='|') #; Prepare header for new file print ('date,batt,vref\n') # Output header to screen o.write ('date,batt,vref\r\n') # Output header to file # Create row counter variable, and null it. rowcount = 0 for row in r: # Every new line will increment counter/index rowcount +=1 n = [(float(value)) for count, value in enumerate( row['day'].split() )] m = [(float(value)) for count, value in enumerate( row['month'].split() )] h = [(float(value)) for count, value in enumerate( row['hour'].split() )] min = [(float(value)) for count, value in enumerate( row['min'].split() )] sec = [(float(value)) for count, value in enumerate( row['sec'].split() )] batt = [(float(value)) for count, value in enumerate( row['batt'].split() )] vref = [(float(value)) for count, value in enumerate( row['vref'].split() )] print ('%d-%d-15 %d:%d:%d,%s,%s' % (n[0], m[0], h[0], min[0], sec[0] ,batt[0],vref[0]) ) o.write ('%d-%d-15 %d:%d:%d,%s,%s\r\n' % (n[0], m[0], h[0], min[0], sec[0] ,batt[0],vref[0]) )
Python-source file available to download in references links at right side.
Result CSV-file
date,batt,vref 9-3-15 23:40:32,12.1337766,7.1367097 9-3-15 23:40:48,12.1337737,7.1367085 9-3-15 23:40:53,12.1337754,7.1367081 9-3-15 23:40:57,12.133775,7.1367083
Another example
import csv from datetime import datetime with open('short_34970A_NPLC10_HIZ.csv','r') as i: with open('output.csv','wb') as o: reader = csv.DictReader(i) o.write ('date,vmeas\r\n') for row in reader: d = datetime.strptime(row['date'], '%m/%d/%Y %H:%M:%S') date = d.strftime('%Y-%m-%d %H:%M:%S') print ('%s,%s' % (date, row['vmeas'])) o.write ('%s,%s\r\n' % (date, row['vmeas']))
Python is also useful for processing large datasets. Common need of this is preparation raw datasets for graphing and analysis. Often all thousands of points are not required, so short app below does some data fiddle with CSV input and produce reduced dataset, suitable for interactive graph creation, even in realtime with frameworks like D3.js
Application code functions as below:
- Read all CSV files in current directory
- Output CSV file names and count qty of files to process
- Looks for invalid bytes (0×00 HEX) in CSV and cuts this bytes from temporary file.
- Temporary cleaned data saved in *_clean.csv
- This modified file is opened again and every 32th line of CSV data stored into new file *_reduced.csv
- After processing, file closed and app does same operation in loop for all files in folder
Code itself:
# xDevs.com Python CSV parser # (i) Tsemenko Illya # (i) 2008-2015 import csv from datetime import datetime import glob import os from os import walk print ("%s" % glob.glob("*.csv")) filecnt = len([name for name in os.listdir('.') if os.path.isfile(name) & name.endswith('csv')]) print ("Total %d CSV files here\r\n" % filecnt) for fcnt in range (0, filecnt): fi = open(glob.glob("*.csv")[fcnt], 'rb') data = fi.read() fi.close() fname = glob.glob("*.csv")[fcnt] nname = fname.replace(".csv", "") + "_clean.dsv" #print ("%s\n" % nname) fo = open(nname, 'wb') fo.write(data.replace('\x00', '')) fo.close() filednt = len([name for name in os.listdir('.') if os.path.isfile(name) & name.endswith('dsv')]) print ("Total %d clean DSV files here\r\n" % filednt) # Open CSV file with input data (in this case - input.csv) for dcnt in range (0, filednt): with open(glob.glob("*.dsv")[dcnt],'rb') as i: print i # Create new file, which will have parsed output di = glob.glob("*.dsv")[dcnt] nname = di.replace(".dsv", "") + "_reduced.psv" with open(nname,'wb') as o: # Read data from test2.csv file r = csv.DictReader(i,delimiter=';') o.write ("date;hp3458a;temp;ext_temp;rh;pressure;\r\n") # Create row counter variable, and null it. rowcount = 1 for pos,row in enumerate(r, 1): if pos % 32 == 0: # Every new line will increment counter/index # 11/12/2015-09:11:11 try: d = datetime.strptime(row['date'], '%d/%m/%Y-%H:%M:%S') date = d.strftime('%d/%m/%Y-%H:%M:%S') except ValueError: print ("-------- Error in line %d ------------" % rowcount) #continue #If you are #d = datetime.strptime(row['date'], '%d/%m/%Y-%H:%M:%S') #date = d.strftime('%Y/%m/%d %H:%M:%S') rowcount +=1 if (rowcount % 256 == 0): print ('%s;%f;%s;%s;%s;%s' % (date,float(row['hp3458a']),row['temp'],row['ext_temp'],row['rh'],row['pressure'])) #Output to screen o.write ('%s;%f;%s;%s;%s;%s\r\n' % (date,float(row['hp3458a']),row['temp'],row['ext_temp'],row['rh'],row['pressure'])) #Output to screen
This app reducing data 32 times, and used for example here to compare daily drift/deviation of three HP 3458A DMMs.
Modified: Jan. 25, 2016, 7:46 a.m.