In data science, keeping the raw filename is "noisy." By converting it into (e.g., "Tuesday") and numerical (e.g., sequence "21") values, you allow an algorithm to understand that a video sent on a weekend might be different from one sent during work hours, or that the 21st video of the day implies a high-activity period.
You can use the re (Regular Expression) module to extract these features instantly from a list of filenames. VID-20220607-WA0021mp4
0021 (The 21st media file saved on that specific day) How to Programmatically Create Features In data science, keeping the raw filename is "noisy
If you are building a machine learning model or an organized archive, you should break this string into multiple sub-features. Feature Name Media_Type Extracted from the VID prefix Timestamp 2022-06-07 Parsed from YYYYMMDD format Year Useful for long-term trend analysis Month Useful for seasonal patterns Day_of_Week Derived from the date (2022-06-07 was a Tuesday) Source_App Identified by the WA string Daily_Index The unique counter for that day 2. Python Implementation (High-Speed Extraction) Feature Name Media_Type Extracted from the VID prefix
To turn a filename like into a "proper feature" for a dataset or application, you need to extract the hidden metadata. This filename is a standard format used by WhatsApp , and it contains specific temporal data. The Extracted Feature Original Filename: VID-20220607-WA0021.mp4 Media Type: Video Date Recorded/Saved: June 7, 2022 Source: WhatsApp ( WA )