MySphere Posts

“I have lots of photo files. Since 2006, when I purchased my first digital camera, the number of photos has grown quickly, and after getting an iPhone, the number of photos exploded.

With the high number of photos, the number of backups grew as well.

I decided to organize all backups and create folders using the format YYYY-MM from the metadata of the photo files.”

Bellow the python script. The script runs on macos:

import os
import shutil
import datetime
import logging
import tkinter as tk
from tkinter import filedialog
from PIL import Image, ExifTags
import pillow_heif
import piexif
import struct

# Setup logging
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")

ATOM_HEADER_SIZE = 8
EPOCH_ADJUSTER = 2082844800  # Difference between Unix and QuickTime epoch

def get_file_date(file_path):
    try:
        if file_path.lower().endswith(".heic") and pillow_heif.is_supported(file_path):
            heif_file = pillow_heif.open_heif(file_path, convert_hdr_to_8bit=False)
            exif_data = heif_file.info.get("exif")
            if exif_data:
                exif_dict = piexif.load(exif_data)
                date_str = exif_dict["Exif"].get(piexif.ExifIFD.DateTimeOriginal)
                if date_str:
                    return datetime.datetime.strptime(date_str.decode("utf-8"), "%Y:%m:%d %H:%M:%S")
        
        elif file_path.lower().endswith((".jpg", ".jpeg")):
            with Image.open(file_path) as img:
                exif_data = img.getexif()
                if exif_data:
                    exif_dict = {ExifTags.TAGS.get(tag, tag): value for tag, value in exif_data.items()}
                    logging.debug(f"EXIF metadata for {file_path}: {exif_dict}")
                    
                    if "DateTimeOriginal" in exif_dict:
                        date_str = exif_dict["DateTimeOriginal"]
                    elif "DateTime" in exif_dict:
                        date_str = exif_dict["DateTime"]
                    else:
                        date_str = None
                        logging.warning(f"No DateTimeOriginal or DateTime found for {file_path}")
                    
                    if date_str:
                        try:
                            logging.debug(f"Extracted date string from EXIF: {date_str}")
                            return datetime.datetime.strptime(date_str, "%Y:%m:%d %H:%M:%S")
                        except ValueError as ve:
                            logging.error(f"Error parsing date for {file_path}: {ve}")
                    else:
                        logging.warning(f"DateTime metadata missing or unreadable for {file_path}")
                else:
                    logging.warning(f"No EXIF metadata found for {file_path}")
    
    except Exception as e:
        logging.error(f"Error extracting date from {file_path}: {e}")
    
    # If metadata exists but could not be parsed, use file birth time (creation date on macOS)
    file_stats = os.stat(file_path)
    file_birth_time = file_stats.st_birthtime
    logging.debug(f"Using file birth time for {file_path}: {datetime.datetime.fromtimestamp(file_birth_time)}")
    return datetime.datetime.fromtimestamp(file_birth_time)

def move_files_to_folders(source_folder):
    for filename in os.listdir(source_folder):
        file_path = os.path.join(source_folder, filename)
        if filename.lower().endswith((".jpg", ".jpeg", ".heic", ".mov")):
            date_taken = get_file_date(file_path)
            if date_taken:
                folder_name = date_taken.strftime("%Y-%m")
            else:
                logging.warning(f"Could not determine date for {file_path}, using 'unknown' folder.")
                folder_name = "unknown"
            
            dest_folder = os.path.join(source_folder, folder_name)
            os.makedirs(dest_folder, exist_ok=True)
            
            dest_file_path = os.path.join(dest_folder, filename)
            count = 1
            while os.path.exists(dest_file_path):
                name, ext = os.path.splitext(filename)
                dest_file_path = os.path.join(dest_folder, f"{name}_{count}{ext}")
                count += 1
            
            shutil.move(file_path, dest_file_path)
            logging.info(f"Moved {filename} to {dest_folder}")

if __name__ == "__main__":
    root = tk.Tk()
    root.withdraw()
    folder_selected = filedialog.askdirectory(title="Select the folder containing files")
    if folder_selected:
        move_files_to_folders(folder_selected)
        logging.info("File organization complete.")
    else:
        logging.warning("No folder selected.")

Uncategorized

The oc adm must-gather tool is essential for troubleshooting and diagnostics in OpenShift. With the release of OpenShift 4.17, new flags have been introduced to enhance flexibility and precision in data collection. These additions enable administrators to gather logs more efficiently while reducing unnecessary data collection.

New Flags in Must-Gather

--since

This flag allows users to collect logs newer than a specified duration. For example:

oc adm must-gather --since=24h

This command gathers logs from the past 24 hours, making it easier to pinpoint recent issues.

--since-time

The --since-time flag lets users specify an exact timestamp (RFC3339 format) to collect logs from a particular point in time.

oc adm must-gather --since-time=2025-02-10T11:12:39Z

This is useful for investigating incidents that occurred at a specific time.

Existing Flags for Enhanced Customization

Along with the new additions, several existing flags provide more control over the data collection process:

  • --all-images: Uses the default image for all operators annotated with operators.openshift.io/must-gather-image.
  • --dest-dir: Specifies a local directory to store gathered data.
  • --host-network: Runs must-gather pods with hostNetwork: true for capturing host-level data.
  • --image: Allows specifying a must-gather plugin image to run.
  • --node-name: Targets a specific node for data collection.
  • --node-selector: Selects nodes based on a node selector.
  • --run-namespace: Runs must-gather pods within an existing privileged namespace.
  • --source-dir: Defines the directory from which data is copied.
  • --timeout: Sets a time limit for data gathering.
  • --volume-percentage: Adjusts the maximum storage percentage for gathered data.

Conclusion

The introduction of --since and --since-time in OpenShift 4.17 significantly improves must-gather’s efficiency by enabling targeted log collection. By leveraging these and other available flags, administrators can streamline troubleshooting and optimize diagnostics.

For a deeper dive into must-gather and its latest enhancements, check out the official OpenShift documentation.

openshift

I set up an OpenShift 4.16 cluster using UPI on top of VMware. The cluster has 3 Masters, 3 Worker Nodes, and 3 InfraNodes. The infra nodes were necessary to install IBM Storage Fusion.

After the setup, I needed to create a load balancer in front of the OpenShift cluster. There are several options, and one of them is HAProxy.

I just installed an RHEL 9 server, added 3 ips to the network card and setup the haproxy.

Prerequisites

  • A system running RHEL 9
  • Root or sudo privileges
  • A basic understanding of networking and load balancing

Step 1: Install HAProxy

First, update your system packages:

sudo dnf update -y

Then, install HAProxy using the package manager:

sudo dnf install haproxy -y

Verify the installation:

haproxy -v

Step 2: Configure HAProxy

The main configuration file for HAProxy is located at /etc/haproxy/haproxy.cfg. Open the file in a text editor:

sudo nano /etc/haproxy/haproxy.cfg

The configuration bellow was used for my cluster. Change the IP adresses to match

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    # to have these messages end up in /var/log/haproxy.log you will
    # need to:
    #
    # 1) configure syslog to accept network log events.  This is done
    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
    #    /etc/sysconfig/syslog
    #
    # 2) configure local2 events to go to the /var/log/haproxy.log
    #   file. A line like the following can be added to
    #   /etc/sysconfig/syslog
    #
    #    local2.*                       /var/log/haproxy.log
    #
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

    # utilize system-wide crypto-policies
    #ssl-default-bind-ciphers PROFILE=SYSTEM
    #ssl-default-server-ciphers PROFILE=SYSTEM

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    tcp
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------

frontend api
    bind 192.168.252.171:6443
    default_backend controlplaneapi

frontend apiinternal
    bind 192.168.252.171:22623
    bind 192.168.252.171:22624
    default_backend controlplaneapiinternal

frontend secure
    bind 192.168.252.170:443
    default_backend secure

frontend insecure
    bind 192.168.252.170:80
    default_backend insecure

#---------------------------------------------------------------------
# static backend
#---------------------------------------------------------------------

backend controlplaneapi
    balance source
    server master-01  192.168.252.5:6443 check
    server master-02  192.168.252.6:6443 check
    server master-03  192.168.252.7:6443 check


backend controlplaneapiinternal
    balance source
    server master-01  192.168.252.5:22623 check
    server master-02  192.168.252.6:22623 check
    server master-03  192.168.252.7:22623 check
    server master-01  192.168.252.5:22624 check
    server master-02  192.168.252.6:22624 check
    server master-03  192.168.252.7:22624 check

backend secure
    balance source
    server worker-01  192.168.252.8:443 check
    server worker-02  192.168.252.9:443 check
    server worker-03  192.168.252.10:443 check
    server  worker-04   192.168.252.11:443 check
    server  worker-05   192.168.252.12:443 check
    server  worker-06   192.168.252.13:443 check

backend insecure
    balance roundrobin
    server worker-01  192.168.252.8:80 check
    server worker-02  192.168.252.9:80 check
    server worker-03  192.168.252.10:80 check
    server worker-04   192.168.252.11:80 check
    server worker-05   192.168.252.12:80 check
    server worker-06  192.168.252.13:80 check

Uncategorized

The watch command is a useful utility in Unix-like systems that allows you to execute a command periodically and display its output. However, macOS does not come with watch pre-installed. If you’re running macOS Sequoia and want to use watch, follow the steps below to install it.

Recently i switch my mabook to a new MacBook Pro M2 and try to use the command to watch some openshift logs and i got the following result:

To install just use Homebrew.

brew install watch

Using watch on macOS

Now that watch is installed, you can start using it. The basic syntax is:

watch -n <seconds> <command>

For example, to monitor the disk usage of your system every two seconds, you can run:

watch -n 2 df -h

Additional Options

  • -d: Highlights the differences between updates.
  • -t: Turns off the title/header display.
  • -b: Beeps if the command exits with a non-zero status.

Alternative: Using a while Loop

If you prefer not to install watch, you can achieve similar functionality using a while loop in the terminal:

while true; do <command>; sleep <seconds>; done

For example:

while true; do df -h; sleep 2; done

This method works in any macOS version without requiring additional installations.

Linux MAC

Managing virtual machines in an Infrastructure as Code (IaC) environment requires efficiency and reliability. One of the central ideas for this is having a single source of truth (SSoT) in order to ensure consistency in resources, improve automation, and leverage processes such as version control. In this type of secluded environment, we can track and test changes and increase our scalability with ease.

This learning path will showcase how to use Red Hat OpenShift GitOps with a Git repository as a single source of truth for our infrastructure, thereby enhancing automation, consistency, and efficiency for VMs in Red Hat OpenShift Virtualization.

https://developers.redhat.com/learn/manage-openshift-virtual-machines-gitops?sc_cid=RHCTG0250000438530

openshift

Machine Learning

When you delete a node using the CLI, the node object is deleted in Kubernetes, but the pods that exist on the node are not deleted. Any bare pods not backed by a replication controller become inaccessible to OpenShift Container Platform. Pods backed by replication controllers are rescheduled to other available nodes. You must delete local manifest pods.

  • To delete the node from the UPI installation, the node must be firstly drained and then marked unschedulable prior to deleting it:

$ oc adm cordon <node_name>
$ oc adm drain <node_name> --force --delete-local-data --ignore-daemonsets
- Ensure also that there are no current jobs/cronjobs being ran or scheduled in this specific node as the draining does not take it into consideration.
- For Red Hat OpenShift Container Platform 4.7+, utilize the option `--delete-emptydir-data` in case `--delete-local-data` doesn't work. The `--delete-local-data` option is deprecated in favor of `--delete-emptydir-data`.

$ oc get node <node_name> -o yaml > backupnode.yaml

Before proceeding with deletion of the node, it needs to be under "power off" status:
$ oc delete node <node_name>

Although the node object is now deleted from the cluster, it can still rejoin the cluster after reboot or if the kubelet service is restarted. To permanently delete the node and all its data, you must decommission the node once it is in shutdown mode.

Once the node is deleted, it can be ready for a power-off activity, or if it is needed to rejoin the cluster, it could be possible to either restart the kubelet or create the yaml back:

$ oc create -f backupnode.yaml

In order to get the node back, it can also be back by restarting kubelet:

$ systemctl restart kubelet

If it is needed to destroy then all the data from the worker node to delete all the software installed, execute the following:

# nohup shred -n 25 -f -z /dev/[HDD]
This command will overwrite all data on /dev/[HDD] repeatedly, in order to make it harder for even very expensive hardware probing to recover the data. Command line parameter -z will overwrite this device with zeros at the end of cycle to re-write data 25 times (it can be overridden with -n [number]).

One should consider running this command from RescueCD.

In order to monitor the deletion of the node, get the kubelet live logs:

$ oc adm node-logs <node-name> -u kubelet

https://access.redhat.com/solutions/4976801

Uncategorized

Applying a specific node selector to all infrastructure components will guarantee that they will be scheduled on nodes with that label. See more details on node selectors in placing pods on specific nodes using node selectors, and about node labels in understanding how to update labels on nodes.

Our node label and matching selector for infrastructure components will be node-role.kubernetes.io/infra: "".

To prevent other workloads from also being scheduled on those infrastructure nodes, we need one of two solutions:

  • Apply a taint to the infrastructure nodes and tolerations to the desired infrastructure workloads.
    OR
  • Apply a completely separate label to your other nodes and matching node selector to your other workloads such that they are mutually exclusive from infrastructure nodes.

TIP: To ensure High Availability (HA) each cluster should have three Infrastructure nodes, ideally across availability zones. See more details about rebooting nodes running critical infrastructure.

TIP: Review the infrastructure node sizing suggestions

By default all nodes except for masters will be labeled with node-role.kubernetes.io/worker: "". We will be adding node-role.kubernetes.io/infra: "" to infrastructure nodes.

However, if you want to remove the existing worker role from your infra nodes, you will need an MCP to ensure that all the nodes upgrade correctly. This is because the worker MCP is responsible for updating and upgrading the nodes, and it finds them by looking for this node-role label. If you remove the label, you must have a MachineConfigPool that can find your infra nodes by the infra node-role label instead. Previously this was not the case and removing the worker label could have caused issues in OCP <= 4.3.

This infra MCP definition below will find all MachineConfigs labeled both “worker” and “infra” and it will apply them to any Machines or Nodes that have the “infra” role label. In this manner, you will ensure that your infra nodes can upgrade without the “worker” role label.

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: ""

If you are not using the MachineSet API to manage your nodes, labels and taints are applied manually to each node:

Label it:

oc label node <node-name> node-role.kubernetes.io/infra=
oc label node <node-name> node-role.kubernetes.io=infra

Taint it:

oc adm taint nodes -l node-role.kubernetes.io/infra node-role.kubernetes.io/infra=reserved:NoSchedule node-role.kubernetes.io/infra=reserved:NoExecute

openshift Uncategorized

Infrastructure nodes allow customers to isolate infrastructure workloads for two primary purposes:

  1. to prevent incurring billing costs against subscription counts and
  2. to separate maintenance and management.

This solution is meant to complement the official documentation on creating Infrastructure nodes in OpenShift 4. In addition there is a great OpenShift Commons video describing this whole process: OpenShift Commons: Everything about Infra nodes

To resolve the first problem, all that is needed is a node label added to a particular node, set of nodes, or machines and machineset. Red Hat subscription vCPU counts omit any vCPU reported by a node labeled node-role.kubernetes.io/infra: "" and you will not be charged for these resources from Red Hat. Please see How to confirm infra nodes not included in subscription cost in OpenShift Cluster Manager? to confirm your vCPU reports correctly after applying the configuration changes in this article.

To resolve the second problem we need to schedule infrastructure workloads specifically to infrastructure nodes and also to prevent other workloads from being scheduled on infrastructure nodes. There are two strategies for accomplishing this that we will go into later.

You may ask why infrastructure workloads are different from those workloads running on the control plane. At a minimum, an OpenShift cluster contains 2 worker nodes in addition to 3 control plane nodes. While control plane components critical to the cluster operability are isolated on the masters, there are still some infrastructure workloads that by default run on the worker nodes – the same nodes on which cluster users deploy their applications.

Note: To know the workloads that can be executed in infrastructure nodes, check the “Red Hat OpenShift control plane and infrastructure nodes” section in OpenShift sizing and subscription guide for enterprise Kubernetes.

Planning node changes around any nodes hosting these infrastructure components should not be addressed lightly, and in general should be addressed separately from nodes specifically running normal application workloads.

openshift

It is not possible to change the domain for the API, internal or external.

Starting with OpenShift 4.8, it is possible to change the domain of the console and downloads routes after cluster installation.

Choose your domain name with carrefully.

More information see this document from RedHat https://access.redhat.com/solutions/4853401

openshift