SFA files

One of the issues that has been identified - CTA does not have functionality corresponding to Enstore SFA (Small File Aggregation). In the nutshell the SFA system is as extension of Enstore system that manages intermediate disk storage on the side (intermediate between dCache and Enstore). Depending on policies based on file_family, storage_group, library and file size Enstore directs files to the intermediate storage for subsequent periodic packaging - tarring the small files into large package files that then are written to tape more efficiently.

The child/parent relation is captured in the same file table by setting child’s file.package_id to be equal to BFID of the package file.

To read SFA files in dCache/CTA setup this relation has to translate in chimera.

There is a solution for it, used by similar to SFA, SAPPHIRE system by dCache. We need to translate:

child_pnfsid, package_pnfsid ->
   -> dcache://dcache/?store=<vo>&group=<file_family>&bfid=<child_pnfsid>:<package_pnfsid>

I.e. the child/package relation exists as location in t_locationinfo Chimera table. As long as these locations exist dCache can read these files from CTA using an hsm script. T.e. SAPPHIRE system is not need for reading of SFA files.

This can be populated out of band.

After some iterations, the final SFA location is expressed as:

sfa://sfa/<child_pnfsid>?packageid=<parent_pnfsid>

The script sfa2dcache.py, located in enstore2cta/scripts, implements SFA metadata migration from Enstore DB to dCache DB.

Invocation

To run the scripqt a config file enstore2cta.yaml must exist in the current directory or be pointed at by MIGRATION_CONFIG environment variable. Look for example in enstore2cta/etc. It must have “0600” permission (to protect database passwords if any).

The configuration yaml must have connection parameters to enstoredb and chimeradb defined with the latter being write (insert, update) enabled.

#  python3 sfa2dcache.py  --help
usage: sfa2dcache.py [-h] [--dir DIR] [--cpu_count CPU_COUNT]

optional arguments:
  -h, --help            show this help message and exit
  --dir DIR             top directory name
  --cpu_count CPU_COUNT
                        override cpu count - number of simultaneously processed labels

Where top directry name is the name of directory where Enstore stores package files (typically /pnfs/fs/usr/file_aggregation/)