site_graphlogo
  -   Terms of Use and Privacy
CB
site_graphlogo
  -   Terms of Use and Privacy
CB

<<   <   >   >>

2020-01-22 | CB | Cruft Buster Log Format

This log format is sufficient to insure the integrity of the Cruft Buster Schema (CBS) for our needs, mainly for MCJ and CBS, which MCJ uses for its schema. CBS has many identical entries, as far as the content of the files, because the tags are symbolic links, and the tags are often empty. As appealing as hashdeep is, it is unsuitable to track state when hashes are identical and the meaning is in the folder structure.

Here is the script:

#!/opt/local/bin/python3.7
import fileinput
import hashlib
import socket
import getpass
import string
try:
   hostname = socket.gethostname().translate({ord(c): None for c in string.whitespace})
except:
   hostname = "unknown"
try:
   user = getpass.getuser().translate({ord(c): None for c in string.whitespace})
except:
   user = "unknown"

#from time import time,gmtime, strftime,time_ns
from datetime import datetime
with fileinput.input() as f_input:
    for line in f_input:
      curline=line.strip()
      lbits=curline.split(' ')
      ts=datetime.utcnow().isoformat(sep='T', timespec='milliseconds')+'Z'
      linedata=ts+' '+hostname+' '+user+' '+curline
      if lbits[0][1:2]!='d':
         h=hashlib.sha1()
         with open(lbits[1], 'rb') as file:
            while True:
               chunk = file.read(h.block_size)
               if not chunk:
                   break
               h.update(chunk)
         flhash=h.hexdigest()
      else:
         flhash='directory'
      print(linedata+' '+hashlib.md5(linedata.encode('utf-8')+flhash.encode('utf-8')).hexdigest())

Here is how it works:

divines-Mini:source divine$ rsync -nltcr cod usr-3@192.168.52.195:~/websites/source/ --delete  --log-file=/dev/null --out-format="$2 %i %f" | awk '!/.DS_Store/' | ../commands/histgen.py
2020-01-22T17:48:24.730Z divines-Mini divine .d..t....... cod 5435fba837fdefa1e362a5c099359fc1
2020-01-22T17:48:24.730Z divines-Mini divine <fcst....... cod/title.txt 6422727e9b0c270f147880b8ed6df0a8
divines-Mini:source divine$ shasum cod/title.txt
88380d1a53db96f1a46d3cfe0534395dd62bb876  cod/title.txt
divines-Mini:source divine$ echo -n "2020-01-22T17:48:24.730Z  divines-Mini divine <fcst....... cod/title.txt88380d1a53db96f1a46d3cfe0534395dd62bb876" | md5
6422727e9b0c270f147880b8ed6df0a8
divines-Mini:source divine$

The entry format is: UTC timestamp to millisecond, hostname, user, rsync file transfer status, relative path of changed piece of CBS, and MD5 has of a SHA1 of the file content+MD5 of log entry. SHA1 will/should match git better as far as the content of the file. MD5 is sufficient to track differences.

All participating hosts form an accumulation in a history folder off of root that holds the tree structure of the updates as well as the log in the history root. The authoritative, central location should not be used by the meeting facilitator. It won't exactly hurt anything to edit directly, but the facilitator's changes will not be logged, so just create a replica and edit that directly.

Outside of collaboration or quick office-to-home-root-to-room edits, use git to manage changes. This logging method will work just fine side-by-side, just rsync to a local replica. It is expected that this method of management doesn't lead to a disruptive need for merges. The narratives and descriptions are particularly troublesome, so some other method should be used if those are written collaboratively, or just collaborate by communicating with your team members on the larger bits.

Cruft Buster may work just fine with cloud file sharing services. Symbolic links are a bit difficult, though, and will likely break in some cases and with some services. Cruft Buster is a specification, not a product. It is meant to be a toolkit and demonstration of the schema, so these ideas will need to be adapted to the organization and tested before relying on them. Read our terms of use.

Comments:

2020-01-22 :

Generally, nothing gets pushed up unless it is done intentionally, since rsync is using the -n option. This makes it easier to code in stages if some of the other rsync flags need to be added or dealt with. Also, the source code for all components will be in a live state on Tributary Software.