xDevs.com | Simple CSV parsers for datalogs in Python

Sometimes there is need to modify order or columns data in CSV-formatted datalogs, for example if charting software accepts only specific formatting.
For small datasets this can be done manually, but there is quicker way to do this job if we use Python environment.

Example dataset and required operations

batt,vref,hour,min,sec,day,month
12.7171572,7.1304967,0,42,49,12,3
12.7172381,7.1304952,0,43,5,12,3
12.7172647,7.1304943,0,43,10,12,3
12.7172929,7.1304944,0,43,14,12,3
12.7173273,7.1304965,0,43,19,12,3

Python program

# xDevs.com Python CSV parser
#  (i) Tsemenko Illya 
#  (i) 2008-2015
import csv
# Open CSV file with input data (in this case input.csv)
with open('input.csv','rb') as i:
    # Create new file, which will have parsed output
    with open('output.csv','wb') as o:
        # Read data from test2.csv file
        r = csv.DictReader(i)
        # Prepare header for new file
        print ('Year,Hour,min,sec,batt,vref\n')   # Output header to screen
        o.write ('Year,Hour,min,sec,batt,vref\n') # Output header to file
        # Create row counter variable, and null it.
        rowcount = 0
        for row in r:
            # Every new line will increment counter/index
            rowcount +=1
            print  ('Date(2015,%s,%s,%s,%s,%s),%s,%s,%d'% (row['month'],row['day'],row['hour'],row['min'],row['sec'],row['batt'],row['vref'],rowcount)) #Output to screen
            o.write('Date(2015,%s,%s,%s,%s,%s),%s,%s,%d\n'% (row['month'],row['day'],row['hour'],row['min'],row['sec'],row['batt'],row['vref'],rowcount)) #Output to file

Python-source file available to download in references links at right side.

Result

Date(2015,3,12,0,42,49),12.7171572,7.1304967,1
Date(2015,3,12,0,43,5),12.7172381,7.1304952,2
Date(2015,3,12,0,43,10),12.7172647,7.1304943,3
Date(2015,3,12,0,43,14),12.7172929,7.1304944,4
Date(2015,3,12,0,43,19),12.7173273,7.1304965,5
Date(2015,3,12,0,43,23),12.7173481,7.1304951,6

More complicated example with space/floating to int trunication

Input data file

batt|vref|hour|min|sec|day|month
 12.1337766 |   7.1367097 |   23.0000000 |   40.0000000 |   32.0000000 |   9.0000000 |   3.0000000 |
 12.1337737 |   7.1367085 |   23.0000000 |   40.0000000 |   48.0000000 |   9.0000000 |   3.0000000 |
 12.1337754 |   7.1367081 |   23.0000000 |   40.0000000 |   53.0000000 |   9.0000000 |   3.0000000 |
 12.1337750 |   7.1367083 |   23.0000000 |   40.0000000 |   57.0000000 |   9.0000000 |   3.0000000 |

Python source code

# xDevs.com Python CSV parser
#  (i) Tsemenko Illya
#  (i) 2008-2015
import csv
# Open CSV file with input data (in this case - input.csv)
with open('input.csv','rb') as i:
    # Create new file, which will have parsed output
    with open('output.csv','wb') as o:
        # Read data from test2.csv file
        r = csv.DictReader(i,delimiter='|')
        #; Prepare header for new file
        print ('date,batt,vref\n')   # Output header to screen
        o.write ('date,batt,vref\r\n') # Output header to file
        # Create row counter variable, and null it.
        rowcount = 0
        for row in r:
            # Every new line will increment counter/index
            rowcount +=1
            n = [(float(value)) for count, value in enumerate( row['day'].split() )]
            m = [(float(value)) for count, value in enumerate( row['month'].split() )]
            h   = [(float(value)) for count, value in enumerate( row['hour'].split() )]
            min = [(float(value)) for count, value in enumerate( row['min'].split() )]
            sec = [(float(value)) for count, value in enumerate( row['sec'].split() )]
            batt = [(float(value)) for count, value in enumerate( row['batt'].split() )]
            vref = [(float(value)) for count, value in enumerate( row['vref'].split() )]
            print ('%d-%d-15 %d:%d:%d,%s,%s' % (n[0], m[0], h[0], min[0], sec[0] ,batt[0],vref[0]) )
            o.write ('%d-%d-15 %d:%d:%d,%s,%s\r\n' % (n[0], m[0], h[0], min[0], sec[0] ,batt[0],vref[0]) )

Python-source file available to download in references links at right side.

Result CSV-file

date,batt,vref
9-3-15 23:40:32,12.1337766,7.1367097
9-3-15 23:40:48,12.1337737,7.1367085
9-3-15 23:40:53,12.1337754,7.1367081
9-3-15 23:40:57,12.133775,7.1367083

Another example

import csv
from datetime import datetime
with open('short_34970A_NPLC10_HIZ.csv','r') as i:
    with open('output.csv','wb') as o:
        reader = csv.DictReader(i)
        o.write ('date,vmeas\r\n')
        for row in reader:
            d = datetime.strptime(row['date'], '%m/%d/%Y %H:%M:%S')
            date = d.strftime('%Y-%m-%d %H:%M:%S')
            print ('%s,%s' % (date, row['vmeas']))
            o.write ('%s,%s\r\n' % (date, row['vmeas']))

Python is also useful for processing large datasets. Common need of this is preparation raw datasets for graphing and analysis. Often all thousands of points are not required, so short app below does some data fiddle with CSV input and produce reduced dataset, suitable for interactive graph creation, even in realtime with frameworks like D3.js

Application code functions as below:

Read all CSV files in current directory
Output CSV file names and count qty of files to process
Looks for invalid bytes (0×00 HEX) in CSV and cuts this bytes from temporary file.
Temporary cleaned data saved in *_clean.csv
This modified file is opened again and every 32th line of CSV data stored into new file *_reduced.csv
After processing, file closed and app does same operation in loop for all files in folder

Code itself:

# xDevs.com Python CSV parser
#  (i) Tsemenko Illya
#  (i) 2008-2015
import csv
from datetime import datetime
import glob
import os
from os import walk

print ("%s" % glob.glob("*.csv"))
filecnt = len([name for name in os.listdir('.') if os.path.isfile(name) & name.endswith('csv')])
print ("Total %d CSV files here\r\n" % filecnt)

for fcnt in range (0, filecnt):
    fi = open(glob.glob("*.csv")[fcnt], 'rb')
    data = fi.read()
    fi.close()
    fname = glob.glob("*.csv")[fcnt]
    nname = fname.replace(".csv", "") + "_clean.dsv"
    #print ("%s\n" % nname)
    fo = open(nname, 'wb')
    fo.write(data.replace('\x00', ''))
    fo.close()

filednt = len([name for name in os.listdir('.') if os.path.isfile(name) & name.endswith('dsv')])
print ("Total %d clean DSV files here\r\n" % filednt)

# Open CSV file with input data (in this case - input.csv)
for dcnt in range (0, filednt):
    with open(glob.glob("*.dsv")[dcnt],'rb') as i:
        print i
        # Create new file, which will have parsed output
        di = glob.glob("*.dsv")[dcnt]
        nname = di.replace(".dsv", "") + "_reduced.psv"
        with open(nname,'wb') as o:
            # Read data from test2.csv file
            r = csv.DictReader(i,delimiter=';')
            o.write ("date;hp3458a;temp;ext_temp;rh;pressure;\r\n")
            # Create row counter variable, and null it.
            rowcount = 1
            for pos,row in enumerate(r, 1):
                if pos % 32 == 0:
                    # Every new line will increment counter/index
                    #                                   11/12/2015-09:11:11
                    try:
                        d = datetime.strptime(row['date'], '%d/%m/%Y-%H:%M:%S')
                        date = d.strftime('%d/%m/%Y-%H:%M:%S')
                    except ValueError:
                        print ("-------- Error in line %d ------------" % rowcount)
                        #continue #If you are 
                    #d = datetime.strptime(row['date'], '%d/%m/%Y-%H:%M:%S')
                    #date = d.strftime('%Y/%m/%d %H:%M:%S')
                    rowcount +=1
                    if (rowcount % 256 == 0):
                        print  ('%s;%f;%s;%s;%s;%s' % (date,float(row['hp3458a']),row['temp'],row['ext_temp'],row['rh'],row['pressure'])) #Output to screen
                    o.write ('%s;%f;%s;%s;%s;%s\r\n' % (date,float(row['hp3458a']),row['temp'],row['ext_temp'],row['rh'],row['pressure'])) #Output to screen

This app reducing data 32 times, and used for example here to compare daily drift/deviation of three HP 3458A DMMs.

Author: xDevs.com Team
Created: March 13, 2015, 10:22 a.m.
Modified: Jan. 25, 2016, 7:46 a.m.