Week 7

Monday, July 8, 2024

Due to an emergency family matter, I was not able to accomplish much today.

Progress:

Wrote code to read in all partial log files to a single data frame [!NOTE] Since there are ~96 million logs (~19GB), this strategy is infeasible for low/mid memory systems.

Tuesday, July 9, 2024

I have a lot of ideas for what I can do, but I also keep thinking up obstacles. One major limitation is the lack of memory on my desktop (~16 GiB). I could use my beefier laptop to actually run the code, but it makes testing tedious to rely on that. On the other hand, it is annoying to try to optimize for memory usage while still making sure I can capture all of the necessary data in my analysis. It would be ideal to just load all of the data into a single data frame, manipulate it, and call it a day.

Progress:

Reviewed previous paper to see what already has been done in terms of statistical analysis and visualization
10 minutes to pandas tutorial completion
Read df.read_csv() docs to figure out way to do “sliding window” strategy for reading across multiple logs (chunksize=x + TextReader) [!IMPORTANT] Don’t worry about this until you can meaningfully process a single file
Select data for a single user and load into data frame

Wednesday, July 10, 2024

I got quite sick today; my luck is pretty atrocious this week. I did still manage to get some meaningful preparatory work done.

Progress:

Learn about python typing annotations
Learn about python data classes
Learn idiomatic pandas usage

Thursday, July 11, 2024

Big implementation day today. Got caught up on some silly pandas usage, but getting the hang of it – very handy, overall. This big data stuff is fun; it forces you to think about logical problems in a resource-conscious way.

Progress:

Record aggregate statistics potentially useful in supplementary failure analysis
Create algorithm to detect and analyze user-specific failure sequences [!NOTE] Analysis not yet implemented, only detection.
- Write pseudocode to capture high-level logic
- Implement using pandas groupby API (split raw df into user groups, apply finding function, analyze when found)

Week 7 Summary

Mostly just tooling and config this week. I suspect that I may have a bit more of this to do next week since I am relatively new to python development in my new text editor. Overall, good progress – nothing crazy productive, but also not too slow (barring extraneous circumstances).

Written on July 8, 2024