New Mars Forums

Official discussion forum of The Mars Society and MarsNews.com

You are not logged in.

Announcement

Announcement: This forum is accepting new registrations by emailing newmarsmember * gmail.com become a registered member. Read the Recruiting expertise for NewMars Forum topic in Meta New Mars for other information for this process.

#3201 2023-12-10 19:52:39

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut re #3200
Thanks for looking at the test data from the new database.
***
As an update, I've run validation for about half the tables   We've identified a definite problem with the message fields of the posts table.
(th)

Offline

#3202 2023-12-10 22:09:08

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut re validation of conversion of CSV flles to database tables ...

A small mystery appears to have been resolved. ...

ChatGPT(4) deduced that kbd512 may have been running on a Windows machine when the CSV files were created.

I don't ** know ** this to have been the case, but the historical record seems clear enough ... the CSV files contains coding that is unique to Windows, and is not used in Linux...

Thank you for providing the output. The repr() function has revealed the source of the mismatch: it's due to different newline and carriage return characters in the 'subject' field between the CSV and the database.

Here's the breakdown of the difference:

    CSV Value: Uses \n (newline) and \t (tab).
    Database Value: Uses \r\n (carriage return + newline) and \t (tab).

In text files, newline characters can vary between systems:

    Unix/Linux systems typically use \n for newlines.
    Windows systems use \r\n for newlines.
    Tabs (\t) are consistent across platforms.

This kind of mismatch often happens when data is transferred between systems with different newline conventions or when text editors automatically adjust newline characters.

(th)

Offline

#3203 2023-12-11 08:19:15

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut re Potholes scan ...

We passed 217000 recently. The Pothole Scan revealed 8 items:

    ID: 214447
    ID: 214935
   ID: 215232
   ID: 215282
   ID: 215284
  ID: 216537
  ID: 216793
  ID: 216983

Not bad for 1000 posts

(th)

Offline

#3204 2023-12-11 18:40:17

SpaceNut
Administrator
From: New Hampshire
Registered: 2004-07-22
Posts: 29,312

Re: Housekeeping

post 3202 has the html code that is built for the display of information.

When I was going through the log files and found web pages with in them, I was able to clean them up by using html 5 character asci code sets.

Offline

#3205 2023-12-11 18:48:14

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut  re #3204

Thanks for noting the difference between CSV files prepared for Windows, and those prepared for Linux.

I found that the Windows format is deleted from the data fed into the MySQL database running on Linux.  The problem arises when I compare the original CSV file (written for Windows) with the same data as stored in Linux.  As the post shows, Windows uses a different combination of special characters than does Linux.

I hope to find a way to ignore the Windows coding so I can complete validation of the data loaded into the database tables.

(th)

Offline

#3206 2023-12-11 19:32:19

Offline

#3207 2023-12-12 08:02:56

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut re #3206

Thank you for finding and posting the links you provided!

I visited all four, and an planning to make them available for the next meeting of the local Linux group, coming up in a week or so!

There is a ** lot ** going on in the Linux world, and you have shown me four related but different examples of people really engaged with idea development!

***
That said, I'm not sure you are following along with me in my efforts to implement a duplicate of the forum software and database.

I'd ** really ** like to see you making some progress in building a duplicate system there at home. You'd be doing the work in Windows, so we could compare notes.  I have pointed out that the CSV files are designed for Windows, and they would import natively there.  They have to be converted to run on Linux, and the import utility seems to have done that successfully for the most part.

I just found (yesterday evening) an example of a perl language post that is generating compare errors.  I assume the Linux import program did the best it could to deliver the perl code to the MySQL database.  The compare program found differences between the  CSV file version and the one that ended up in the Linux MySQL database.  I don't think the post is worth worrying about.  Perl is not an active language these days, and the post itself is only of historical interest, if that.

In doing this work, I am gaining an understanding of how challenging it must have been for James Burk and others helping him, to convert the forum to new packages, or just to recover from crashes.

***
If you have not already done so, please consider investing in hardware that allows you to swap hard drives, so you can develop a duplicate forum package separately from your regular computing environmental.

(th)

Offline

#3208 2023-12-12 11:15:58

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut ... This post is reserved for a collection of reports on validation of the 19 tables that comprise the Fluxbb forum database.

SearchTerm:Validation

Here is the opening validation for the topics table:

python3 validatetables15.py
csv_path_file_name: /home/tahanson/mysql_archive/fluxbb_topics_202311121621.csv
table name: fluxbb_topics
All 8290 rows validated successfully.

The validation program had to be adjusted to handle the Windows quirks that Linux removes. They are still present in the CSV file, of course.

There are some tables that use composite keys, so I had to create one-time versions of the validation programs for those.

I am still dealing with the challenge of validating the 245 megabyte posts table.

My plan is to come back to this post to add validations for the other tables.

Update after completing validation run for posts table. We definitely have a problem:

tahanson@tom-GA-78LMT-01:~/mysql_archive$ python3 validatetables17.py
csv_path_file_name: /home/tahanson/mysql_archive/fluxbb_posts_202311121532.csv
table name: fluxbb_posts
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 29206
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 33050
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 33807
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 35862
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 55007
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 57022
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 57562
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 58400
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 60818
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 61035
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 64765
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 65596
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 65648
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 65689
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 65736
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 68617
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 71884
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 72611
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 72919
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 74623
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 94845
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 97447
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 123039
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 144400
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 145499
Mismatch found. Do you want to continue? (yes/no): y
No corresponding record found in database for CSV record ID 152054
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
Mismatch found. Do you want to continue? (yes/no): y
All 210255 rows validated successfully.
tahanson@tom-GA-78LMT-01:~/mysql_archive$

I had the program write a file to show the mismatched records. I'll save that on Dropbox and provide a link here.
In every case where you see a missing record in the list above, there is garbage at the end of the previous record in the database.
It looks to me as though the conversion program read the beginning of the next record, and stored it at the end of the message.
In any case, considering there are over 200,000 records, the number of errors seems modest.

Here is a link to the mismatch report for the posts table:
https://www.dropbox.com/scl/fi/pzbz9hh0 … 89grk&dl=0

(th)

Offline

#3209 2023-12-12 12:52:35

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut ... The post table has a few errors in the Linux version of the MySQL database.

You have the CSV file, and you spoke of possibly trying the load into ACCESS.

ACCESS should be able to hold 210225 records.

Please try the conversion on your system (if it has room) and let me know what happens.

As you can see by inspection of the mismatched records, the ones that are followed by "missing records" appear to have gobbled up part of the following records. That situation may account for the strange result I got when I looked for posts that have topic 2 in the topic_id field.

(th)

Offline

#3210 2023-12-12 13:55:44

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For kbd512....

The CSV file backup option may not be the best choice for the posts table.

Can you backup the database tables using a binary format of some kind?

(th)

Offline

#3211 2023-12-12 18:33:13

SpaceNut
Administrator
From: New Hampshire
Registered: 2004-07-22
Posts: 29,312

Re: Housekeeping

the excel error is a cell size character count is exceeded.
It should be also possible to view with a browser reader.

Offline

#3212 2023-12-12 18:36:11

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut re #3211

Excel would fail!  You had suggested trying with Access?

Do you have Access installed on your home system?

Come to think of it, I might be able to test with a Windows machine, using Open Office (or Libre Office) database.

My guess is that CSV is not the right format for the posts table.

(th)

Offline

#3213 2023-12-12 19:49:17

SpaceNut
Administrator
From: New Hampshire
Registered: 2004-07-22
Posts: 29,312

Re: Housekeeping

I had full copies of MS office, but it was for 2007 and probably will not run under the 64 bit for this machine.
But will see if it will.

Will check to see if I can get a work copy.
Other than that, we will need to keep looking.

Offline

#3214 2023-12-12 20:58:54

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut .... re #3213 ... thanks for trying the 32 bit version on your 64 bit machine ...

It is possible Microsoft supports upward compatibility.

Linux surprised me by doing the reverse ... They have been working on allowing their 64 bit OS to run on a 32 bit chip, and they got it to work.  The performance is as poor as might be expected, but it ** does ** work. 

Have you had any luck finding swapping hardware for your system?

(th)

Offline

#3215 2023-12-13 09:51:23

Calliban
Member
From: Northern England, UK
Registered: 2019-08-18
Posts: 3,579

Re: Housekeeping

Internal server error when posting here:
https://newmars.com/forums/viewtopic.php?id=1920&p=7

Occured at ~15:47 GMT.


"Plan and prepare for every possibility, and you will never act. It is nobler to have courage as we stumble into half the things we fear than to analyse every possible obstacle and begin nothing. Great things are achieved by embracing great dangers."

Offline

#3216 2023-12-13 12:26:28

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For Calliban re #1215

Thanks for reporting another occurrence of AISE!

I hope our members will bear with us.  An update to the software is in early development, but we are still some distance from an actual upgrade.

The good news is our webmaster is buried in work for a major international retailer.

The bad news is our webmaster is buried in work.

SpaceNut and I are attempting to validate the backup taken a few weeks ago.  We appear to have found a minor glitch with the (huge) posts table, so there will be a delay while we look for a solution.

Meanwhile, aside from the AISE events, the web site seems to be performing well.

(th)

Offline

#3218 2023-12-13 19:15:11

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut ...

Apache gave me a hard time about this message.

I'm starting over, adding a bit of text at a time.

I just ran a test of a small script called listtopic.php

I asked the script to show me all posts for topic #1

The result was ** very ** interesting.  The posts reported back are the posts that are corrupted.

This tells me the little listtopic.php script is working. It is the posts table that is corrupted.


(th)

Offline

#3219 2023-12-13 19:21:18

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

Here  is the list produced by listtopic.php:

Post ID: 29205
Post ID: 33049
Post ID: 33806
Post ID: 35861
Post ID: 55006
Post ID: 57021
Post ID: 57561
Post ID: 58399
Post ID: 60817
Post ID: 61034
Post ID: 64764
Post ID: 65595
Post ID: 65647
Post ID: 65688
Post ID: 65735
Post ID: 68616
Post ID: 71883
Post ID: 72610
Post ID: 72918
Post ID: 74622
Post ID: 94844
Post ID: 97446
Post ID: 111070
Post ID: 123038
Post ID: 144399
Post ID: 145498
Post ID: 152053

SpaceNut. if you look at 3208, you'll see that the records reported as missing are immediately after the records listed in the report above.

What appears to have happened is that the utility I used to load the CSV data appended the missing records to the records showing up in the list above.

(th)

Offline

#3220 2023-12-13 20:49:27

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut .... I'm back from discussing the defective rows problem with ChatGPT(4) ... we've agreed on a strategy that would allow us to fix the defective rows without having to start all over again.

Of 210,000+ rows, only a few are defective or missing.  We can modify the existing validation program to delete defective rows and replace them, and to insert rows where they are missing.

I am greatly encouraged by this discussion, and will take it up tomorrow.

(th)

Offline

#3221 2023-12-14 13:22:34

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut ... I just finished another work session with ChatGPT(4) ... the purpose this time was to see if we could work out how to upgrade the validation program to serve as a repair and replace program, to deal with the 30+ failures that were discovered in the posts table.  I've saved the suggested code, and will integrate it (very carefully) into the existing(working) validation program.  I'll let you know what happens.
(th)

Offline

#3222 2023-12-14 19:37:13

SpaceNut
Administrator
From: New Hampshire
Registered: 2004-07-22
Posts: 29,312

Re: Housekeeping

I was reading about the character limit for a post in the php environment and it indicated that the sending of the post had a control software that a mode can be set for smaller posts and large posts as well. It seemed that 2k characters were the limit would be in one of those modes.

PHP/MySQL Limiting characters outputted from a field

Limit Size of Message Postings Module

I also found in my crashed newmars folder that I was looking at the building of a forum site as I had all of the instructions. Of course, the website it was from is gone now from 2016.

Offline

#3223 2023-12-14 21:26:43

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut re #3222 ...

Thank you for the information you still have a crashed newmars folder ...

Thanks too, for investigating possible limits we need to be aware of in heading toward the new environment.

I logged on just now to report this:

python3 repairtable30.py
csv_path_file_name: /home/tahanson/mysql_archive/fluxbb_posts_202311121532.csv
table name: fluxbb_posts
All 210281 rows processed. Repaired 2363 rows. Inserted 725 missing rows.

What is amusing (to me at least) is that we (ChatGPT(4) and I started out with only 30 bad records and 30 missing ones, but as we worked on the program we kept damaging and deleting records.  The last run (above) seems to have repaired and replaced all the damaged ones.

Tomorrow I''ll run the validation program against the "repaired" posts table, to see if the table matches the CSV file.  ChatGPT(4) got into some deep doodoo as it tried to add the insert code to the repair code. As you probably know, it is a timesharing system... it gives me a few nanoseconds and then takes care of thousands of others before I come back with the next question.  In this case, it seemed to forget the work it had already done, so when the insert worked the repair stopped. We went through several iterations of this before things seemed to fall into place with this last run.

If the posts table is in fact repaired, then I'll finish validation of the small number of tables left to check.

(th)

Offline

#3224 2023-12-14 21:54:57

tahanson43206
Moderator
Registered: 2018-04-27
Posts: 18,169

Re: Housekeeping

For SpaceNut ...

Post ID: 49567
Post ID: 49571
Post ID: 49576
Post ID: 49581
Post ID: 49672
Post ID: 49788
Post ID: 49790
Post ID: 49800
Post ID: 49866
Post ID: 53336
Post ID: 126676
Post ID: 185237

This is output from the listtopic.php script. Those Post ID's look right, and the count of 12 is correct.

It appears that the posts table may indeed have been repaired, but more testing is needed to be certain.

(th)

Offline

#3225 2023-12-15 08:35:28

NewMarsMember
Member
Registered: 2019-02-17
Posts: 1,318

Re: Housekeeping

Here's a forum message I've not seen before:

Bad CSRF hash. You were referred to this page from an unauthorized source.

The system seems to have honored the login despite the problem with the hash.

(th)


Recruiting High Value members for NewMars.com/forums, in association with the Mars Society

Offline

Board footer

Powered by FluxBB